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ETAPS Foreword 


Welcome to the 23rd ETAPS! This is the first time that ETAPS took place in Ireland in 
its beautiful capital Dublin. 

ETAPS 2020 was the 23rd instance of the European Joint Conferences on Theory 
and Practice of Software. ETAPS is an annual federated conference established in 
1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each 
conference has its own Program Committee (PC) and its own Steering Committee 
(SC). The conferences cover various aspects of software systems, ranging from 
theoretical computer science to foundations of programming language developments, 
analysis tools, and formal approaches to software engineering. Organizing these 
conferences in a coherent, highly synchronized conference program enables researchers 
to participate in an exciting event, having the possibility to meet many colleagues 
working in different directions in the field, and to easily attend talks of different 
conferences. On the weekend before the main conference, numerous satellite 
workshops took place that attracted many researchers from all over the globe. Also, for 
the second time, an ETAPS Mentoring Workshop was organized. This workshop is 
intended to help students early in the program with advice on research, career, and life 
in the fields of computing that are covered by the ETAPS conference. 

ETAPS 2020 received 424 submissions in total, 129 of which were accepted, 
yielding an overall acceptance rate of 30.4%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their reviewing efforts, the PC members for their 
contributions, and in particular the PC (co-)chairs for their hard work in running this 
entire intensive process. Last but not least, my congratulations to all authors of the 
accepted papers! 

ETAPS 2020 featured the unifying invited speakers Scott Smolka (Stony Brook 
University) and Jane Hillston (University of Edinburgh) and the conference-specific 
invited speakers (ESOP) Isil Dillig (University of Texas at Austin) and (FASE) Willem 
Visser (Stellenbosch University). Invited tutorials were provided by Erika Abraham 
(RWTH Aachen University) on the analysis of hybrid systems and Madhusudan 
Parthasarathy (University of Illinois at Urbana-Champaign) on combining Machine 
Learning and Formal Methods. On behalf of the ETAPS 2020 attendants, I thank all the 
speakers for their inspiring and interesting talks! 

ETAPS 2020 took place in Dublin, Ireland, and was organized by the University of 
Limerick and Lero. ETAPS 2020 is further supported by the following associations and 
societies: ETAPS e.V., EATCS (European Association for Theoretical Computer 
Science), EAPLS (European Association for Programming Languages and Systems), 
and EASST (European Association of Software Science and Technology). The local 
organization team consisted of Tiziana Margaria (general chair, UL and Lero), 
Vasileios Koutavas (Lero@UCD), Anila Mjeda (Lero@UL), Anthony Ventresque 
(Lero@ UCD), and Petros Stratis (Easy Conferences). 


vi ETAPS Foreword 


The ETAPS Steering Committee (SC) consists of an Executive Board, and 
representatives of the individual ETAPS conferences, as well as representatives of 
EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns 
(Saarbrücken), Marieke Huisman (chair, Twente), Joost-Pieter Katoen (Aachen and 
Twente), Jan Kofron (Prague), Gerald Liittgen (Bamberg), Tarmo Uustalu (Reykjavik 
and Tallinn), Caterina Urban (Inria, Paris), and Lenore Zuck (Chicago). 

Other members of the SC are: Armin Biere (Linz), Jordi Cabot (Barcelona), Jean 
Goubault-Larrecq (Cachan), Jan-Friso Groote (Eindhoven), Esther Guerra (Madrid), 
Jurriaan Hage (Utrecht), Reiko Heckel (Leicester), Panagiotis Katsaros (Thessaloniki), 
Stefan Kiefer (Oxford), Barbara König (Duisburg), Fabrice Kordon (Paris), Jan 
Kretinsky (Munich), Kim G. Larsen (Aalborg), Tiziana Margaria (Limerick), Peter 
Müller (Zurich), Catuscia Palamidessi (Palaiseau), Dave Parker (Birmingham), 
Andrew M. Pitts (Cambridge), Peter Ryan (Luxembourg), Don Sannella (Edinburgh), 
Bernhard Steffen (Dortmund), Mariélle Stoelinga (Twente), Gabriele Taentzer 
(Marburg), Christine Tasson (Paris), Peter Thiemann (Freiburg), Jan Vitek (Prague), 
Heike Wehrheim (Paderborn), Anton Wijs (Eindhoven), and Nobuko Yoshida 
(London). 

I would like to take this opportunity to thank all speakers, attendants, organizers 
of the satellite workshops, and Springer for their support. I hope you all enjoyed 
ETAPS 2020. Finally, a big thanks to Tiziana and her local organization team for all 
their enormous efforts enabling a fantastic ETAPS in Dublin! 


February 2020 Marieke Huisman 
ETAPS SC Chair 
ETAPS e.V. President 


Preface 


Welcome to the European Symposium on Programming (ESOP 2020)! The 29th 
edition of this conference series was initially planned to be held April 27-30, 2020, in 
Dublin, Ireland, but was then moved to fall 2020 due to the COVID-19 outbreak. 
ESOP is one of the European Joint Conferences on Theory and Practice of Software 
(ETAPS). It is devoted to fundamental issues in the specification, design, analysis, and 
implementation of programming languages and systems. 

This volume contains 27 papers, which the Program Committee (PC) selected 
among 87 submissions. Each submission received between three and six reviews. After 
an author response period, the papers were discussed electronically among the PC 
members and external reviewers. The one paper for which the PC chair had a conflict of 
interest was kindly handled by Sasa Misailovic. 

Submissions authored by a PC member were held to slightly higher standards: they 
received at least four reviews, had an external reviewer, and were accepted only if they 
were not involved in comparisons of relative merit with other submissions. We 
accepted two out of four PC submissions. 

The final program includes a keynote by Isil Dillig on “Formal Methods for 
Evolving Database Applications.” 

Any conference depends first and foremost on the quality of its submissions. I would 
like to thank all the authors who submitted their work to ESOP 2020! I am truly 
impressed by the members of the PC. They produced insightful and constructive 
reviews, contributed very actively to the online discussions, and were extremely 
helpful. It was an honor to work with all of you! I am also grateful to the external 
reviewers, who provided their expert opinions and helped tremendously to reach 
well-informed decisions. I would like to thank everybody who contributed to the 
organization of ESOP 2020, especially the ESOP 2020 Steering Committee and its 
chair Peter Thiemann as well as the ETAPS 2020 Steering Committee and its chair 
Marieke Huisman, who provided help and guidance on numerous occasions. Finally, 
Pd like to thank Linard Arquint and Vasileios Koutavas for their help with the 
proceedings. 
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Formal Methods for Evolving 
Database Applications 
(Abstract of Keynote Talk) 


Işıl Dillig 
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isil@cs.utexas.edu 


Many database applications undergo significant schema changes during their life cycle 
due to performance or maintainability reasons. Examples of such schema changes 
include denormalization, splitting a single table into multiple tables, and consolidating 
multiple tables into a single table. Even though such schema refactorings are quite 
common in practice, programmers need to spend significant time and effort to 
re-implement parts of the code base that are affected by the schema change. Further- 
more, it is not uncommon to introduce bugs during this code transformation process. 

In this talk, I will present our recent work on using formal methods to simplify the 
schema refactoring process for evolving database applications. Specifically, I will first 
propose a definition of equivalence between database applications that operate over 
different schemas. Building on this definition, I will then present a fully automated 
technique for proving equivalence between a pair of applications. Our verification 
technique is capable of automatically synthesizing bisimulation invariants between two 
database applications and uses the inferred bisimulation invariant to automatically 
prove equivalence. 

In the next part of the talk, I will explain how to leverage this verification technique 
to completely automate the code migration process. Specifically, given an original 
database application P over schema S and a new schema S’, I will discuss a practical 
program synthesis technique that can be used to generate a new program P’ over 
schema S’ such that P and P’ are provably equivalent. In particular, I will first present a 
method for generating a program sketch of the new version; then, I will describe a 
novel synthesis algorithm that efficiently explores the space of all programs that are in 
the search space of the generated sketch. 

Finally, I will describe experimental results on a suite of schema refactoring 
benchmarks, including real-world database applications written in Ruby-on-Rails. 
I will also outline remaining challenges in this area and motivate future research 
directions relevant to research in programming languages and formal methods. 
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Abstract. Compiler correctness is, in its simplest form, defined as the inclusion 
of the set of traces of the compiled program into the set of traces of the origi- 
nal program, which is equivalent to the preservation of all trace properties. Here 
traces collect, for instance, the externally observable events of each execution. 
This definition requires, however, the set of traces of the source and target lan- 
guages to be exactly the same, which is not the case when the languages are far 
apart or when observations are fine-grained. To overcome this issue, we study a 
generalized compiler correctness definition, which uses source and target traces 
drawn from potentially different sets and connected by an arbitrary relation. We 
set out to understand what guarantees this generalized compiler correctness defi- 
nition gives us when instantiated with a non-trivial relation on traces. When this 
trace relation is not equality, it is no longer possible to preserve the trace prop- 
erties of the source program unchanged. Instead, we provide a generic charac- 
terization of the target trace property ensured by correctly compiling a program 
that satisfies a given source property, and dually, of the source trace property one 
is required to show in order to obtain a certain target property for the compiled 
code. We show that this view on compiler correctness can naturally account for 
undefined behavior, resource exhaustion, different source and target values, side- 
channels, and various abstraction mismatches. Finally, we show that the same 
generalization also applies to many secure compilation definitions, which char- 
acterize the protection of a compiled program against linked adversarial code. 


1 Introduction 


Compiler correctness is an old idea [37, 40, 41] that has seen a significant revival in re- 
cent times. This new wave was started by the creation of the CompCert verified C com- 
piler [33] and continued by the proposal of many significant extensions and variants of 
CompCert [8, 9, 12, 23, 29, 30, 42, 52, 56, 57, 61] and the success of many other mile- 
stone compiler verification projects, including Vellvm [64], Pilsner [45], CakeML [58], 
CertiCogq [4], etc. Yet, even for these verified compilers, the precise statement of cor- 
rectness matters. Since proof assistants are used to conduct the verification, an external 
observer does not have to understand the proofs in order to trust them, but one still has 
to deeply understand the statement that was proved. And this is true not just for correct 
compilation, but also for secure compilation, which is the more recent idea that our 
compilation chains should do more to also ensure security of our programs [3, 26]. 


Basic Compiler Correctness. The gold standard for compiler correctness is semantic 
preservation, which intuitively says that the semantics of a compiled program (in the 
target language) is compatible with the semantics of the original program (in the source 


© The Author(s) 2020 
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language). For practical verified compilers, such as CompCert [33] and CakeML [58], 
semantic preservation is stated extrinsically, by referring to traces. In these two settings, 
a trace is an ordered sequence of events—such as inputs from and outputs to an external 
environment—that are produced by the execution of a program. 

A basic definition of compiler correctness can be given by the set inclusion of the 
traces of the compiled program into the traces of the original program. Formally [33]: 


Definition 1.1 (Basic Compiler Correctness (CC)). A compiler | is correct iff 
VW t. Wilt > Wrst. 


This definition says that for any whole! source program W, if we compile it (denoted 
WJ), execute it with respect to the semantics of the target language, and observe a trace 
t, then the original W can produce the same trace t with respect to the semantics of 
the source language.” This definition is simple and easy to understand, since it only 
references a few familiar concepts: a compiler between a source and a target language, 
each equipped with a trace-producing semantics (usually nondeterministic). 


Beyond Basic Compiler Correctness. This basic compiler correctness definition as- 
sumes that any trace produced by a compiled program can be produced by the source 
program. This is a very strict requirement, and in particular implies that the source and 
target traces are drawn from the same set and that the same source trace corresponds 
to a given target trace. These assumptions are often too strong, and hence in practice 
verified compiler efforts use different formulations of compiler correctness: 

CompCert [33] The original compiler correctness theorem of CompCert [33] can be 
seen as an instance of basic compiler correctness, but it does not provide any guar- 
antees for programs that can exhibit undefined behavior [53]. As allowed by the 
C standard, such unsafe programs are not even considered to be in the source lan- 
guage, so are not quantified over. This has important practical implications, since 
undefined behavior often leads to exploitable security vulnerabilities [13, 24, 25] 
and serious confusion even among experienced C and C++ developers [32, 53, 59, 
60]. As such, since 2010, CompCert provides an additional top-level correctness 
theorem? that better accounts for the presence of unsafe programs by providing 
guarantees for them up to the point when they encounter undefined behavior [53]. 
This new theorem goes beyond the basic correctness definition above, as a target 
trace need only correspond to a source trace up to the occurrence of undefined 
behavior in the source trace. 

CakeML [58] Compiler correctness for CakeML accounts for memory exhaustion in 
target executions. Crucially, memory exhaustion events cannot occur in source 
traces, only in target traces. Hence, dually to CompCert, compiler correctness only 
requires source and target traces to coincide up to the occurrence of a memory 
exhaustion event in the target trace. 


' For simplicity, for now we ignore separate compilation and linking, returning to it in §5. 

> Typesetting convention [47]: we use a blue, sans-serif font for source elements, an orange, 
bold font for target ones and a black, italic font for elements common to both languages. 

3 Stated at the top of the CompCert file driver/Complements . v and discussed by Regehr [53]. 
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Trace-Relating Compiler Correctness. Generalized formalizations of compiler cor- 
rectness like the ones above can be naturally expressed as instances of a uniform defini- 
tion, which we call trace-relating compiler correctness. This generalizes basic compiler 
correctness by (a) considering that source and target traces belong to possibly distinct 
sets Traces and Tracer, and (b) being parameterized by an arbitrary trace relation ~. 


Definition 1.2 (Trace-Relating Compiler Correctness (CC~)). A compiler | is cor- 
rect with respect to a trace relation ~ C Traces x Tracer iff 


VW.Vt. Wl~t ds ~ t.W~s. 


This definition requires that, for any target trace t produced by the compiled program 

WJ, there exist a source trace s that can be produced by the original program W and is 

related to t according to ~ (i.e., s ~ t). By choosing the trace relation appropriately, 

one can recover the different notions of compiler correctness presented above: 

Basic CC Takes ~ t to bes = t. Trivially, the basic CC of Definition 1.1 is CC™. 

CompCert Undefined behavior is modeled in CompCert as a trace-terminating event 
Goes_wrong that can occur in any of its languages (source, target, and all in- 
termediate languages), so for a given phase (or composition thereof), we have 
Traces = Tracer. Nevertheless, the relation between source and target traces 
with which to instantiate CC~ to obtain CompCert’s current theorem is: 


svt = s=tV (am < t.s= m- Goes_wrong). 
A compiler satisfying CC™ for this trace relation can turn a source trace ending 
in undefined behavior m- Goes_wrong (where “-” is concatenation) either into the 
same trace in the target (first disjunct), or into a target trace that starts with the 
prefix m but then continues arbitrarily (second disjunct, “<” is the prefix relation). 
CakeML Here, target traces are sequences of symbols from an alphabet Xr that has 
a specific trace-terminating event, Resource_limit_hit, which is not available 
in the source alphabet Ès (i.e., Er = Ls U {Resource_limit_hit}. Then, the 
compiler correctness theorem of CakeML can be obtained by instantiating CC~ 
with the following ~ relation: 


svt = s=tV(dm.m<s.t =m-Resource_limit_hit). 
The resulting CC™ instance relates a target trace ending in Resource_limit_hit 
after executing m to a source trace that first produces m and then continues in a 
way given by the semantics of the source program. 


Beyond undefined behavior and resource exhaustion, there are many other practical 
uses for CC™: in this paper we show that it also accounts for differences between source 
and target values, for a single source output being turned into a series of target outputs, 
and for side-channels. 

On the flip side, the compiler correctness statement and its implications can be 
more difficult to understand for CC~ than for CC~. The full implications of choosing a 
particular ~ relation can be subtle. In fact, using a bad relation can make the compiler 
correctness statement trivial or unexpected. For instance, it should be easy to see that 
if one uses the total relation, which relates all source traces to all target ones, the CC~ 
property holds for every compiler, yet it might take one a bit more effort to understand 
that the same is true even for the following relation: 


svt = JIWW~ws^ Wt. 
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Reasoning About Trace Properties. To understand more about a particular CC™ in- 
stance, we propose to also look at how it preserves trace properties—defined as sets of 
allowed traces [31]—from the source to the target. For instance, it is well known that 
CC5 is equivalent to the preservation of all trace properties (where W |= m reads “W 
satisfies 7” and stands for Vt. Wt > t € 7): 
CCH = Yr e 2° YW. Wer > WET. 

However, to the best of our knowledge, similar results have not been formulated for 
trace relations beyond equality, when it is no longer possible to preserve the trace prop- 
erties of the source program unchanged. For trace-relating compiler correctness, where 
source and target traces can be drawn from different sets and related by an arbitrary 
trace relation, there are two crucial questions to ask: 

1. For a source trace property ms of a program—established for instance by formal 
verification—what is the strongest target property that any CC~ compiler is guar- 
anteed to ensure for the produced target program? 

2. Fora target trace property mr, what is the weakest source property we need to show 
of the original source program to obtain 7p for the result of any CC~ compiler? 

Far from being mere hypothetical questions, they can help the developer of a verified 
compiler to better understand the compiler correctness theorem they are proving, and 
we expect that any user of such a compiler will need to ask either one or the other if they 
are to make use of that theorem. In this work we provide a simple and natural answer to 
these questions, for any instance of CC~. Building upon a bijection between relations 
and Galois connections [5, 20, 43], we observe that any trace relation ~ corresponds 
to two property mappings T and co, which are functions mapping source properties to 
target ones (7 standing for “to target”) and target properties to source ones (o standing 
for “to source”): 

T(ms) = {t | ds.s~tAse€ ms}; õl(nr)= {s | Yt.s~t=>tE rr}. 
The existential image of ~, 7, answers the first question above by mapping a given 
source property 7s to the target property that contains all target traces for which there 
exists a related source trace that satisfies ns. Dually, the universal image of ~, ©, an- 
swers the second question by mapping a given target property 7p to the source property 
that contains all source traces for which all related target traces satisfy mr. We intro- 
duce two new correct compilation definitions in terms of trace property preservation 
(TP): TP* quantifies over all source trace properties and uses 7 to obtain the corre- 
sponding target properties. TP? quantifies over all target trace properties and uses o 
to obtain the corresponding source properties. We prove that these two definitions are 
equivalent to CC™, yielding a novel trinitarian view of compiler correctness (Figure 1). 


VW. Vt. Wl~t ds ~ t. W~s 
Ill 
CEF 
Yrr. YW. W = õ(rr) ee i Yrs. WW. W E 7s 


= W Err = TP TP? = => WLE7Z(as) 


Fig. 1: The equivalent compiler correctness definitions forming our trinitarian view. 
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Contributions. 


We propose a new trinitarian view of compiler correctness that accounts for non-trivial 
trace relations. While, as discussed above, specific instances of the CC~ definition have 
already been used in practice, we seem to be the first to propose assessing the meaning- 
fulness of CC~ instances in terms of how properties are preserved between the source 
and the target, and in particular by looking at the property mappings o and 7 induced 
by the trace relation ~. We prove that CC~, TP®, and TP? are equivalent for any 
trace relation (§2.2), as illustrated in Figure |. In the opposite direction, we show that 
for every trace relation corresponding to a given Galois connection [20], an analogous 
equivalence holds. Finally, we extend these results (§2.3) from the preservation of trace 
properties to the larger class of subset-closed hyperproperties (e.g., noninterference). 
We use CC~ compilers of various complexities to illustrate that our view on com- 
piler correctness naturally accounts for undefined behavior (§3.1), resource exhaustion 
(§3.2), different source and target values (§3.3), and differences in the granularity of 
data and observable events (§3.4). We expect these ideas to apply to any other discrep- 
ancies between source and target traces. For each compiler we show how to choose 
the relation between source and target traces and how the induced property mappings 
preserve interesting trace properties and subset-closed hyperproperties. We look at the 
way particular o and 7 work on different kinds of properties and how the produced 
properties can be expressed for different kinds of traces. 

We analyze the impact of correct compilation on noninterference [22], showing what 
can still be preserved (and thus also what is lost) when target observations are finer than 
source ones, e.g., side-channel observations (§4). We formalize the guarantee obtained 
by correct compilation of a noninterfering program as abstract noninterference [21], a 
weakening of target noninterference. Dually, we identify a family of declassifications 
of target noninterference for which source reasoning is possible. 

Finally, we show that the trinitarian view also extends to a large class of secure com- 
pilation definitions [2], formally characterizing the protection of the compiled program 
against linked adversarial code (§5). For each secure compilation definition we again 
propose both a property-free characterization in the style of CC~, and two character- 
izations in terms of preserving a class of source or target properties satisfied against 
arbitrary adversarial contexts. The additional quantification over contexts allows for 
finer distinctions when considering different property classes, so we study mapping 
classes not only of trace properties and hyperproperties, but also of relational hyper- 
properties [2]. An example secure compiler accounting for a target that can produce 
additional trace events that are not possible in the source illustrates this approach. 


The paper closes with discussions of related ($6) and future work (§7). An online ap- 
pendix contains omitted technical details: https: //arxiv.org/abs/1907 . 05320. 


The traces considered in our examples are structured, usually as sequences of events. 
We notice however that unless explicitly mentioned, all our definitions and results are 
more general and make no assumption whatsoever about the structure of traces. Most 
of the theorems formally or informally mentioned in the paper were mechanized in the 
Coq proof assistant and are marked with # . This development has around 10k lines of 
code, is described in the online appendix, and is available at the following address: 
https://github.com/secure-compilation/different_traces. 
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2 Trace-Relating Compiler Correctness 


In this section, we start by generalizing the trace property preservation definitions at 
the end of the introduction to TP’ and TP’, which depend on two arbitrary mappings 
o and 7 (§2.1). We prove that, whenever g and 7 form a Galois connection, TP” and 
TP” are equivalent (Theorem 2.4). We then exploit a bijective correspondence between 
trace relations and Galois connections to close the trinitarian view (§2.2), with two main 
benefits: first, it helps us assess the meaningfulness of a given trace relation by look- 
ing at the property mappings it induces; second, it allows us to construct new compiler 
correctness definitions starting from a desired mapping of properties. Finally, we gen- 
eralize the classic result that compiler correctness (i.e., CC~) is enough to preserve not 
just trace properties but also all subset-closed hyperproperties [14]. For this, we show 
that CC~ is also equivalent to subset-closed hyperproperty preservation, for which we 
also define both a version in terms of o and a version in terms of 7 (§2.3). 


2.1 Property Mappings 


As explained in §1, trace-relating compiler correctness CC™, by itself, lacks a crisp de- 
scription of which trace properties are preserved by compilation. Since even the syntax 
of traces can differ between source and target, one can either look at trace properties of 
the source (but then one needs to interpret them in the target), or at trace properties of 
the target (but then one needs to interpret them in the source). Formally we need two 
property mappings, r : 2'@ces 5 2Tracer and g ; QTracer _, QTraces, which lead us 
to the following generalization of trace property preservation (TP). 


Definition 2.1 (TP” and TP’). Given two property mappings, T : 2'?°° — 2Tracer 
and g : 2T8°er — 2T8¢es, for a compilation chain -| we define: 


TP” = Vas. YW. W E rs > WL E T(rs); TP? = Var. YW. W Fo(rr) > WE rr. 


For an arbitrary source program W, 7 interprets a source property ms as the target 
guarantee for W}. Dually, o defines a source obligation sufficient for the satisfaction 
of a target property mr after compilation. Ideally: 
— Given 7r, the target interpretation of the source obligation o (7r) should actually 
guarantee that mr holds, i.e., r(o(mr)) C TT; 
- Dually for zs, we would not want the source obligation for 7(7s) to be harder than 
Ts itself, i.e., 0(T(7s)) D Ts. 
These requirements are satisfied when the two maps form a Galois connection between 
the posets of source and target properties ordered by inclusion. We briefly recall the 
definition and the characteristic property of Galois connections [16, 38]. 


Definition 2.2 (Galois connection). Let (X, <) and (Y,C) be two posets. A pair of 
maps, a: X > Y, y : Y — X isa Galois connection iff it satisfies the adjunction law: 
Va € X. Yy € Y. a(x) Ey = x< 7(y). a (resp. y) is the lower (upper) adjoint 
or abstraction (concretization) function and Y (X ) the abstract (concrete) domain. 


We will often write a : (X, <) 5 (Y, E) : y to denote a Galois connection, or simply 
a: X SY : 7, or even a 5 y when the involved posets are clear from context. 
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Lemma 2.3 (Characteristic property of Galois connections). [fa:(X,~<) 5 (Y,C):y7 
is a Galois connection, then a, are monotone and they satisfy these properties: 

i) Va EX. zx x< y(a(z)); ii) Vy € Y.a(y(y)) E y. 
If X,Y are complete lattices, then a is continuous, i.e, YF C X. a(| | F) = L]a(F). 


If two property mappings, T and g, form a Galois connection on trace properties ordered 
by set inclusion, Lemma 2.3 (with a = 7 and y = ø) tells us that they satisfy the ideal 
conditions we discussed above, i.e., T(o(mxr)) C mr and o(T(75)) D 75.4 

The two ideal conditions on 7 and o are sufficient to show the equivalence of the 
criteria they define, respectively TP” and TP’. 


Theorem 2.4 (TP? and TP” coincide #). Let r : 2™°°s = 2TraceT : o be a Galois 
connection, with T and o the lower and upper adjoints (resp.). Then TP™ <> TP®. 


2.2 Trace Relations and Property Mappings 


We now investigate the relation between CC~, TP” and TP’. We show that for a trace 
relation and its corresponding Galois connection (Lemma 2.7), the three criteria are 
equivalent (Theorem 2.8). This equivalence offers interesting insights for both verifi- 
cation and design of a correct compiler. For a CC~ compiler, the equivalence makes 
explicit both the guarantees one has after compilation (7) and source proof obligations 
to ensure the satisfaction of a given target property (a). On the other hand, a compiler 
designer might first determine the target guarantees the compiler itself must provide, 
i.e., T, and then prove an equivalent statement, CC~, for which more convenient proof 
techniques exist in the literature [7, 58]. 


Definition 2.5 (Existential and Universal Image [20]). Given any two sets X and Y 
and a relation ~ C A x B, define its existential or direct image, 7 : 2X — 2Y and its 
universal image, © : 2Y — 2* as follows: 

F=AnE WX. {y | dear yAcem;6=rATE 2. {x | Vyrrysyer}. 


When trace relations are considered, the existential and universal images can be used to 
instantiate Definition 2.1 leading to the trinitarian view already mentioned in $1. 


Theorem 2.6 (Trinitarian View ¥ ). For any trace relation ~ and its existential and 
universal images T and a, we have: TP?’ <> CC™ <= > TP’. 


This result relies both on Theorem 2.4 and on the fact that the existential and universal 
images of a trace relation form a Galois connection (# ). Below we further generalize 
this result (Theorem 2.8) relying on a bijective correspondence between trace relations 
and Galois connections on properties. 


Lemma 2.7 (Trace relations = Galois connections on trace properties). The func- 
tion ~ ++ T 5 @ that maps a trace relation to its existential and universal images 
is a bijection between trace relations 2'"°*s*7*8°°r and Galois connections on trace 
properties 2's < 2Tracer. Its inverse is T S o +> œ, where s&t = t € T({s}). 


4 While target traces are often “more concrete” than source ones, trace properties 2™® (which 
in Coq we represent as the function type Trace—Prop) are contravariant in Trace and thus 
target properties correspond to the abstract domain. 
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Proof. Gardiner et al. [20] show that the existential image is a functor from the category 
of sets and relations to the category of predicate transformers, mapping a set X +> 2* 
and a relation ~ C X x Y +> 7 : 2* — 2”. They also show that such a functor 
is an isomorphism — hence bijective — when one considers only monotonic predicate 
transformers that have a — unique — upper adjoint. The universal image of ~, o, is the 
unique adjoint of 7 (# ), hence ~ +> 7 5S G is itself bijective. 


The bijection just introduced allows us to generalize Theorem 2.6 and switch between 
the three views of compiler correctness described earlier at will. 


Theorem 2.8 (Correspondence of Criteria). For any trace relation ~ and corre- 
sponding Galois connection T S o, we have: TP?’ <=> CC™ <=> TP”. 


Proof. For a trace relation ~ and the Galois connection 7 $ g, the result follows from 
Theorem 2.6. For a Galois connection T  o and ~, use Lemma 2.7 to conclude that 
the existential and universal images of ~ coincide with 7 and ø, respectively; the goal 
then follows from Theorem 2.6. 


We conclude by explicitly noting that sometimes the lifted properties may be trivial: 
the target guarantee can be the true property (the set of all traces), or the source obli- 
gation the false property (the empty set of traces). This might be the case when source 
observations abstract away too much information (§3.2 presents an example). 


2.3 Preservation of Subset-Closed Hyperproperties 


A CC compiler ensures the preservation not only of trace properties, but also of all 
subset-closed hyperproperties, which are known to be preserved by refinement [14]. An 
example of a subset-closed hyperproperty is noninterference [14]; a CCF compiler thus 
guarantees that if W is noninterfering with respect to the inputs and outputs in the trace 
then so is WJ. To be able to talk about how (hyper)properties such as noninterference 
are preserved, in this section we propose another trinitarian view involving CC~ and 
preservation of subset-closed hyperproperties (Theorem 2.11), slightly weakened in that 
source and target property mappings will need to be closed under subsets. 

First, recall that a program satisfies a hyperproperty when its complete set of traces, 
which from now on we will call its behavior, is a member of the hyperproperty [14]. 


Definition 2.9 (Hyperproperty Satisfaction). A program W satisfies a hyperproperty 
H, written W = H, iff beh(W) € H, where beh(W) = {t | W~»t}. 


Hyperproperty preservation is a strong requirement in general. Fortunately, many inter- 
esting hyperproperties are subset-closed (SCH for short), which simplifies their preser- 
vation since it suffices to show that the behaviors of the compiled program refine the 
behaviors of the source one, which coincides with the statement of CC>. 

To talk about hyperproperty preservation in the trace-relating setting, we need an 
interpretation of source hyperproperties into the target and vice versa. The one we con- 
sider builds on top of the two trace property mappings 7 and ø, which are naturally 
lifted to hyperproperty mappings. This way we are able to extract two hyperproperty 
mappings from a trace relation similarly to $2.2: 


Trace-Relating Compiler Correctness and Secure Compilation 9 


Definition 2.10 (Lifting property mappings to hyperproperty mappings). Let 7 : 
gtraces _, gTracer qnd g ; QTracer _y QTtaces be arbitrary property mappings. The 
images of Hs € 92s „Hr € 227°" under T and o are, respectively: 


T(Hs) = {r(ms) | ms € Hs}; o(Hr) = {o(mr) | mr € Hr}. 


Formally we are defining two new mappings, this time on hyperproperties, but by a 
small abuse of notation we still denote them by 7 and ø. 

Interestingly, it is not possible to apply the argument used for CC~ to show that a 
CC~ compiler guarantees WJ = 7(Hs) whenever W — Hs. This is in fact not true 
because direct images do not necessarily preserve subset-closure [36, 44]. To fix this 
we close the image of 7 and & under subsets (denoted as Clc) and obtain: 


Theorem 2.11 (Preservation of Subset-Closed Hyperproperties #7). For any trace 
relation ~ and its existential and universal images lifted to hyperproperties, T and o, 
and for Clc (H) = {r | dn’ € H. m C7'}, we have: 
SCHP@se? <> CCY = > SCHP“S°?, where 
SCHP“c° = WWVHs € SCHs.W H Hs > WIE Clc(F(Hs)); 


SCHP“c°? = YWVH r € SCHr.W E Clc(é(Hr)) > WL H Hr. 


Theorem 2.11 makes us aware of the potential loss of precision when interested in 
preserving subset-closed hyperproperties through compilation. In §4 we focus on a se- 
curity relevant subset-closed hyperproperty, noninterference, and show that such a loss 
of precision can be intended as a declassification of noninterference. 


3 Instances of Trace-Relating Compiler Correctness 


The trace-relating view of compiler correctness above can serve as a unifying frame- 
work for studying a range of interesting compilers. This section provides several rep- 
resentative instantiations of the framework: source languages with undefined behavior 
that compilation can turn into arbitrary target behavior (§3.1), target languages with re- 
source exhaustion that cannot happen in the source (§3.2), changes in the representation 
of values ($3.3), and differences in the granularity of data and observable events (§3.4). 


3.1 Undefined Behavior 


We start by expanding upon the discussion of undefined behavior in §1. We first study 
the model of CompCert, where source and target alphabets are the same, including the 
event for undefined behavior. The trace relation weakens equality by allowing undefined 
behavior to be replaced with an arbitrary sequence of events. 


Example 3.1 (CompCert-like Undefined Behavior Relation). Source and target traces 
are sequences of events drawn from 4’, where Goes_wrong € X is a terminal event that 
represents an undefined behavior. We then use the trace relation from the introduction: 


svt = s=tVdim<t.s=m- Goes_wrong. 
Each trace of a target program produced by a CC~ compiler is either also a trace of the 
original source program or it has a finite prefix that the source program also produces, 
immediately before encountering undefined behavior. As explained in §1, one of the 
correctness theorems in CompCert can be rephrased as this variant of CC~. 
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We proved that the property mappings induced by the relation can be written as (a ): 


(nr) = {s | serr As 4 m-Goes_wrong}U {m-Goes_wrong | Vt. m<t = > terr}; 
T(ms) = {t | tems} U {t | Im < t. m-Goes_wrong € Ts}. 


These two mappings explain what a CC~ compiler ensures for the ~ relation above. The 
target-to-source mapping o states that to prove that a compiled program has a property 
mr using source-level reasoning, one has to prove that any trace produced by the source 
program must either be a target trace satisfying 77 or have undefined behavior, but only 
provided that any continuation of the trace substituted for the undefined behavior satis- 
fies 77. The source-to-target mapping 7 states that by compiling a program satisfying 
a property 7s we obtain a program that produces traces that satisfy the same property 
or that extend a source trace that ends in undefined behavior. 

These definitions can help us reason about programs. For instance, g specifies that, 
to prove that an event does not happen in the target, it is not enough to prove that it 
does not happen in the source: it is also necessary to prove that the source program is 
does not have any undefined behavior (second disjunct). Indeed, if it had an undefined 
behavior, its continuations could exhibit the unwanted event. 


This relation can be easily generalized to other settings. For instance, consider the 
setting in which we compile down to a low-level language like machine code. Target 
traces can now contain new events that cannot occur in the source: indeed, in modern 
architectures like x86 a compiler typically uses only a fraction of the available instruc- 
tion set. Some instructions might even perform dangerous operations, such as writing 
to the hard drive. Formally, the source and target do not have the same events any more. 
Thus, we consider a source alphabet Ys = X U {Goes_wrong}, and a target alpha- 
bet Sp = X U X. The trace relation is defined in the same way and we obtain the 
same property mappings as above, except that since target traces now have more events 
(some of which may be dangerous), and the arbitrary continuations of target traces get 
more interesting. For instance, consider a new event that represents writing data on the 
hard drive, and suppose we want to prove that this event cannot happen for a compiled 
program. Then, proving this property requires exactly proving that the source program 
exhibits no undefined behavior [11]. More generally, what one can prove about target- 
only events can only be either that they cannot appear (because there is no undefined 
behavior) or that any of them can appear (in the case of undefined behavior). 

In §5.2 we study a similar example, showing that even in a safe language linked ad- 
versarial contexts can cause dangerous target events that have no source correspondent. 


3.2 Resource Exhaustion 
Let us return to the discussion about resource exhaustion in §1. 
Example 3.2 (Resource Exhaustion). We consider traces made of events drawn from 


Xs in the source, and Sp = Ys U {Resource_Limit_Hit} in the target. Recall the 
trace relation for resource exhaustion: 


svt = s=tVdim<s.t =m: Resource_Limit_Hit. 
Formally, this relation is similar to the one for undefined behavior, except this time it is 
the target trace that is allowed to end early instead of the source trace. 
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The induced trace property mappings o and 7 are the following (# ): 
lnr) ={s|seEamr}N{s| Vm <s.m- Resource_Limit_Hit € wr}; 


F(ms5) = Ts U {m - Resource_Limit_Hit | 4s € ns. m < s}. 
These capture the following intuitions. The target-to-source mapping © states that to 
prove a property of the compiled program one has to show that the traces of the source 
program satisfy two conditions: (1) they must also satisfy the target property; and (2) 
the termination of every one of their prefixes by a resource exhaustion error must be 
allowed by the target property. This is rather restrictive: any property that prevents re- 
source exhaustion cannot be proved using source-level reasoning. Indeed, if mr does 
not allow resource exhaustion, then (mr) = Ø. This is to be expected since resource 
exhaustion is simply not accounted for at the source level. The other mapping 7 states 
that a compiled program produces traces that either belong to the same properties as the 
traces of the source program or end early due to resource exhaustion. 

In this example, safety properties [31] are mapped (in both directions) to other safety 
properties (# ). This can be desirable for a relation: since safety properties are usually 
easier to reason about, one interested only in safety properties at the target can reason 
about them using source-level reasoning tools for safety properties. 

The compiler correctness theorem in CakeML is an instance of CC~ for the ~ 
relation above. We have also implemented two small compilers that are correct for this 
relation. The full details can be found in the Coq development in the supplementary 
materials. The first compiler (4 ) goes from a simple expression language (similar to the 
one in §3.3 but without inputs) to the same language except that execution is bounded by 
some amount of fuel: each execution step consumes some amount of fuel and execution 
immediately halts when it runs out of fuel. The compiler is the identity. 

The second compiler (#) is more interesting: we proved this CC~ instance for a 
variant of a compiler from a WHILE language to a simple stack machine by Xavier 
Leroy [35]. We enriched the two languages with outputs and modified the semantics of 
the stack machine so that it falls into an error state if the stack reaches a certain size. 
The proof uses a standard forward simulation modified to account for failure. 


We conclude this subsection by noting that the resource exhaustion relation and 
the undefined behavior relation from the previous subsection can easily be combined. 
Indeed, given a relation ~yg and a relation ~pg defined as above on the same sets of 
traces, we can build a new relation ~ that allows both refinement of undefined behavior 
and resource exhaustion by taking their union: s ~ t = s ~us t VS ~pg t. A compiler 
that is CC™™® or CC~®E is trivially CC~, though the converse is not true. 


3.3 Different Source and Target Values 


We now illustrate trace-relating compilation for a translation mapping source-level 
booleans to target-level natural numbers. Given the simplicity of this compiler, most 
of the details of the formalization are deferred to the online appendix. 

The source language is a pure, statically typed expression language whose expres- 
sions e include naturals n, booleans b, conditionals, arithmetic and relational operations, 
boolean inputs in, and natural inputs inp. A trace s is a list of inputs is paired with a 
result r, which can be a natural, a boolean, or an error. Well-typed programs never pro- 
duce error (# ). Types ty are either N (naturals) or B (booleans); typing is standard. The 
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source language has a standard big-step operational semantics (e ~> (is, r)) which tells 
how an expression e generates a trace (is, r). The target language is analogous, except 
that it is untyped, only has naturals n and its only inputs are naturals inn. The semantics 
of the target language is also given in big-step style. Since we only have naturals and 
all expressions operate on them, no error result is possible in the target. 
The compiler is homomorphic, translating a source expression to the same target 

expression; the only differences are natural numbers (and conditionals), as noted below. 

true] =1 inp} = inp ey < e2ļ = İf ey) < ez} then 1 else 0 

false] =O innļ =in» ife then e2 else e3) = if e1} < 0 then e3ļ else en) 
When compiling an if-then-else the target condition e;{ < 0 is used to check that e; is 
false, and therefore the then and else branches of the source are swapped in the target. 


Relating Traces. We relate basic values (naturals and booleans) in a non-injective fash- 
ion as noted below. Then, we extend the relation to lists of inputs pointwise (Rules Empty 
and Cons) and lift that relation to traces (Rules Nat and Bool). 


n~on true~n ifn>0 false ~ 0 
(Empty) (Cons) (Nat) (Bool) 
iai sads is~is n~n is~is bean 
eo i-is~wi-is (is,n) ~ (is, n) (is, b) ~ (is, n) 


Property mappings. The property mappings ð and 7 induced by the trace relation ~ 
defined above capture the intuition behind encoding booleans as naturals: 
— the source-to-target mapping allows true to be encoded by any non-zero number; 
— the target-to-source mapping requires that 0 be replaceable by both 0 and false. 


Compiler correctness. With the relation above, the compiler is proven to satisfy CC~. 


Theorem 3.3 (-| is correct ¥). -| is CC. 


Simulations with different traces. The difficulty in proving Theorem 3.3 arises from 
the trace-relating compilation setting: For compilation chains that have the same source 
and target traces, it is customary to prove compiler correctness using a forward simula- 
tion (i.e., a simulation between source and target transition system); then, using deter- 
minacy [18, 39] of the target language and input totality [19, 63] (aka receptiveness) of 
the source, this forward simulation is flipped into a backward simulation (a simulation 
between target and source transition system), as described by Beringer et al. [7], Leroy 
[34]. This flipping is useful because forward simulations are often much easier to prove 
(by induction on the transitions of the source) than backward ones, as it is the case here. 

We first give the main idea of the flipping proof, when the inputs are the same in 
the source and the target [7, 34]. We only consider inputs, as it is the most interesting 
case, since with determinacy, nondeterminism only occurs on inputs. Given a forward 
simulation R, and a target program W r that simulates a source program Ws, W r is 
able to perform an input iff so is Ws: otherwise, say for instance that Ws performs an 
output, by forward simulation Wr would also perform an output, which is impossible 
because of determinacy. By input totality of the source, Ws must be able to perform 
the exact same input as W r; using forward simulation and determinacy, the resulting 
programs must be related. 
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Ws = Ws R Wr 
| 
a sh i 
11 12 i 
| By input totality By contradiction, ‘a 
Yy using forward simulation 
= Ws if L . and determinacy me W T1 
~ R = 


By forward simulation and determinacy 

However, our trace relation is not injective (both 0 and false are mapped to 0), 
therefore these arguments do not apply: not all possible inputs of target programs are 
accounted for in the forward simulation. We thus have to strengthen the forward sim- 
ulation assumption, requiring the following additional property to hold, for any source 
program Ws and target program W r related by the forward simulation R. 


Ws R —— Wr 
= a f iss where isı ~ ir1ı 
E 7 Isy 1T1 : : 
+ Isa ~ IT2 
Ws.. Wi —-R——Wri __ Wr —* 
cies tee 


We say that a forward simulation for which this property holds is flippable. For our 
example compiler, a flippable forward simulation works as follows: whenever a boolean 
input occurs in the source, the target program must perform every strictly positive input 
n (and not just 1, as suggested by the compiler). Using this property, determinacy of 
the target, input totality of the source, as well as the fact that any target input has an 
inverse image through the relation, we can indeed show that the forward simulation can 
be turned into a backward one: starting from Ws R Wr and an input ips, we show 
that there is is; and irə as in the diagram above, using the same arguments as when the 
inputs are the same; because the simulation is flippable, we can close the diagram, and 
obtain the existence of an adequate is». From this we obtain CC~. 


In fact, we have proven a completely general ‘flipping theorem’, with this flippable 
hypothesis on the forward simulation (# ). We have also shown that if the relation ~ 
defines a bijection between the inputs of the source and the target, then any forward 
simulation is flippable, hence reobtaining the usual proof technique [7, 34] as a special 
case. This flipping theorem is further discussed in the online appendix. 


3.4 Abstraction Mismatches 


We now consider how to relate traces where a single source action is compiled to mul- 
tiple target ones. To illustrate this, we take a pure, statically-typed source language that 
can output (nested) pairs of arbitrary size, and a pure, untyped target language where 
sent values have a fixed size. Concretely, the source is analogous to the language of $3.3, 
except that it does not have inputs or booleans and it has an expression send e, which 
can emit a (nested) pair e of values in a single action. That is, given that e reduces 
to a pair, e.g., (v1, (v2, v3)), expression send (v1, (v2, v3)) emits action (v1, (v2, v3)). 
That expression is compiled into a sequence of individual sends in the target language 
send v1 ; send v2 ; send v3, since in the target, send e sends the value that e re- 
duces to, but the language has no pairs. 
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Due to space constraints we omit the full formalization of these simple languages 
and of the homomorphic compiler COR : e — e). The only interesting bit is the 
compilation of the send - expression, which relies on the gensend (-) function below. 
That function takes a source expression of a given type and returns a sequence of target 
send - instructions that send each element of the expression. 


send (He: N) | ifr =N 
gensend (F e : T) = i 
gensend (F e.1 : 7’); gensend (F e.2 : T”) ifr=r x 7” 


Relating Traces. We start with the trivial relation between numbers: n ~? n, i.e., num- 


bers are related when they are the same. We cannot build a relation between single ac- 
tions since a single source action is related to multiple target ones. Therefore, we define 
a relation between a source action M and a target trace t (a list of numbers), inductively 
on the structure of M (which is a pair of values, and values are natural numbers or pairs). 


(Trace-Rel-N-N) (Trace-Rel-N-M) (Trace-Rel-M-N) (Trace-Rel-M-M) 
nan n/n?’ n?n Mat M~t n~n Mnt Mat 
(n,n) ~nn’ (n,M)~n-t (M,n)~t-n (M, M) ~t- t 


A pair of naturals is related to the two actions that send each element of the pair 
(Rule Trace-Rel-N-N). If a pair is made of sub-pairs, we require all such sub-pairs to be 
related (Rules Trace-Rel-N-M to Trace-Rel-M-M). We build on these rules to define the 


s ~ t relation between source and target traces for which the (Trace-Rel-Single) 
compiler is correct (Theorem 3.4). Trivially, traces are related swt Mat’ 
when they are both empty. Alternatively, given related traces, s:-Mnt-t’ 


we can concatenate a source action and a second target trace 
provided that they are related (Rule Trace-Rel-Single). 


Theorem 3.4 ((-) | is correct). (-) | is CC~. 


With our trace relation, the trace property mappings capture the following intuitions: 

— The target-to-source mapping states that a source property can reconstruct target 
action as it sees fit. For example, trace 4 - 6 - 5 - 7 is related to (4,6) - (5,7) and 
((4, (6, (5, 7)))) (and many more variations). This gives freedom to the source im- 
plementation of a target behavior, which follows from the non-injectivity of ~.° 

— The source-to-target mapping “forgets” about the way pairs are nested, but is faith- 
ful w.r.t. the values v; contained in a message. Notice that source safety properties 
are always mapped to target safety properties. For instance, if ms € Safetys pre- 
scribes that some bad number is never sent, then 7(7s) prescribes the same number 
is never sent in the target and 7(7s) € Safety. Of course if ms € Safetys pre- 
scribes that a particular nested pairing like (4, (6, (5, 7))) never happens, then 7 (rs) 
is still a target safety property, but the trivial one, since 7(75) = T € Safety. 


4 Trace-Relating Compilation and Noninterference Preservation 


When source and target observations are drawn from the same set, a correct compiler 
(CC~) is enough to ensure the preservation of all subset-closed hyperproperties, in par- 
ticular of noninterference (NI) [22], as also mentioned at the beginning of §2.3. In the 


5 Making ~ injective is a matter of adding open and close parenthesis actions in target traces. 
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scenario where target observations are strictly more informative than source observa- 
tions, the best guarantee one may expect from a correct trace-relating compiler (CC™) 
is a weakening (or declassification) of target noninterference that matches the noninter- 
ference property satisfied in the source. To formalize this reasoning, this section applies 
the trinitarian view of trace-relating compilation to the general framework of abstract 
noninterference (ANI) [21]. 

We first define NI and explain the issue of preserving source NI via a CC~ compiler. 
We then introduce ANI, which allows characterizations of various forms of noninterfer- 
ence, and formulate a general theory of ANI preservation via CC~. We also study how 
to deal with cases such as undefined behavior in the target. Finally, we answer the dual 
question, i.e., which source NI should be satisfied to guarantee that compiled programs 
are noninterfering with respect to target observers. 


Intuitively, NI requires that publicly observable outputs do not reveal information 
about private inputs. To define this formally, we need a few additions to our setup. We 
indicate the (disjoint) input and output projections of a trace t as t° and ¢* respectively®. 
Denote with [¢];.w the equivalence class of a trace t, obtained using a standard low- 
equivalence relation that relates low (public) events only if they are equal, and ingores 
any difference between private events. Then, NI for source traces can be defined as: 

Nls = {ts | Vs1S2 E Ts. [Silow = [$5] tow > [St ]tow = [Ssiow Fa 
That is, source NI comprises the sets of traces that have equivalent low output projec- 
tions as long as their low input projections are equivalent. 


Trace-Relating Compilation and Noninterference. When additional observations are 
possible in the target, it is unclear whether a noninterfering source program is compiled 
to a noninterfering target program or not, and if so, whether the notion of NI in the tar- 
get is the expected or desired one. We illustrate this issue considering a scenario where 
target traces extend source ones by exposing the execution time. While source noninter- 
ference Nls requires that private inputs do not affect public outputs, NIr additionally 
requires that the execution time is not affected by private inputs. 

To model the scenario described, let Traces denote the set of traces in the source, 
and Tracey = Traces x N” be the set of target traces, where NY = NU {w}. Tar- 
get traces have two components: a source trace, and a natural number that denotes 
the time spent to produce the trace (w if infinite). Notice that if two source traces 
S1,S2, are low-equivalent then {s;,so} € Nls and {(s,, 42), (s1, 42)} € NIr, but 
{(s1, 42), (so, 43)} ¢ NIr and {(s1, 42), (s2, 42), (s1, 43), (s2, 43)} Z NIr. 

Consider the following straightforward trace relation, which relates a source trace 
to any target trace whose first component is equal to it, irrespective of execution time: 


svt = 4dn.t=(s,n). 
A compiler is CC™ if any trace that can be exhibited in the target can be simulated 
in the source in some amount of time. For such a compiler Theorem 2.11 says that 
if W satisfies Nls, then WJ satisfies Clc o 7(NIs), which however is strictly weaker 


than NI +, as it contains, e.g., {(s1, 42), (s2, 42), (s1, 43), (s2, 43)}, and one cannot 
conclude that W} is noninterfering in the target. It is easy to prove that 


ê Here we only require the projections to be disjoint. Depending on the scenario and the attacker 
model the projections might record information such as the ordering of events. 
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Cle o F(NIs) = Cle ({ ms x NY | rs E NIs}) ={ a5 x Z| tg ENISAZ CN®}, 
the first equality coming from 7(7s5) = ms x N”, and the second from Nls being 
subset-closed. As we will see, this hyperproperty can be characterized as a form of 
NI, which one might call timing-insensitive noninterference, and ensured only against 
attackers that cannot measure execution time. For this characterization, and to describe 
different forms of noninterference as well as formally analyze their preservation by a 
CC~ compiler, we rely on the general framework of abstract noninterference [21]. 


Abstract Noninterference. ANI [21] is a generalization of NI whose formulation re- 
lies on abstractions (in abstract interpretation sense [16]) in order to encompass arbi- 
trary variants of NI. ANI is parameterized by an observer abstraction p, which denotes 
the distinguishing power of the attacker, and a selection abstraction ¢, which specifies 
when to check NI, and therefore captures a form of declassification [54].’ Formally: 
ANIG = {7 | Vtite € T. E(t) = (ts) => alti) = p(ta)}- 

By picking ¢ = p = [-]iow, we recover the standard noninterference defined above, 
where NI must hold for all low inputs (i.e., no declassification of private inputs), and 
the observational power of the attacker is limited to distinguishing low outputs. 

The observational power of the attacker can be weakened by choosing a more liberal 
relation for p. For instance, one may limit the attacker to observe the parity of output 
integer values. Another way to weaken ANI is to use ¢ to specify that noninterference 
is only required to hold for a subset of low inputs. 

To be formally precise, ¢ and p are defined over sets of (input and output projections 
of) traces, so when we write (t) above, this should be understood as a convenience 
notation for o({t}). Likewise, @ = [-]iow should be understood as ¢ = Ar. Uer ltliow» 
i.e., the powerset lifting of [-],..,. Additionally, ¢ and p are required to be upper-closed 
operators (uco)—i.e., monotonic, idempotent and extensive—on the poset that is the 
powerset of (input and output projections of) traces ordered by inclusion [21]. 


Trace-Relating Compilation and ANI for Timing. We can now reformulate our ex- 
ample with observable execution times in the target in terms of ANI. We have Nls = 
ANI ye with és = ps = [-]zw. In this case, we can formally describe the hyperproperty 
that a compiled program W} satisfies whenever W satisfies Nls as an instance of ANI: 


Cle o #(NIs) = ANI", 


for by = ds and py (wr) = {(s,n) | 3(s1, n1) € wr. [Siow = [Si]iow} - 
The definition of @-p tells us that the trace relation does not affect the selection abstrac- 
tion. The definition of py characterizes an observer that cannot distinguish execution 
times for noninterfering traces (notice that n; in the definition of py is discarded). For 
instance, p-p({(s,n1)}) = pr({(s, n2)}), for any s, n1, n2. Therefore, in this setting, 
we know explicitly through py that a CC~ compiler degrades source noninterference 
to target timing-insensitive noninterference. 


Trace-Relating Compilation and ANI in General. While the particular ¢ and py 
above can be discovered by intuition, we want to know whether there is a systematic 
way of obtaining them in general. In other words, for any trace relation ~ and any 


7 ANT includes a third parameter n, which describes the maximal input variation that the attacker 
may control. Here we omit 77 (i.e., take it to be the identity) in order to simplify the presentation. 
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notion of source NI, what property is guaranteed on noninterfering source programs by 
any CC~ compiler? 

We can now answer this question generally (Theorem 4.1): any source notion of 
noninterference expressible as an instance of ANI is mapped to a corresponding in- 
stance of ANI in the target, whenever source traces are an abstraction of target ones 
(i.e., when ~ is a total and surjective map). For this result we consider trace relations 
that can be split into input and output trace relations (denoted as ~ = (<, ~)) such that 
svt 4 st’ As? ~A t. The trace relation ~ corresponds to a Galois connection 
between the sets of trace properties T  o as described in §2.2. Similarly, the pair ~ 
and ~ corresponds to a pair of Galois connections, 7° S o° and T° SS o°, between the 
sets of input and output properties. In the timing example, time is an output so we have 
~ Ê (=, ©) and ~ is defined as s* ~ t* = Jn. t* = (s*, n). 


Theorem 4.1 (Compiling ANI). Assume traces of source and target languages are 
related via ~ C Traces x Tracer, ~ = (&,<) such that ~ and ~ are both total 
maps from target to source traces, and ~ is surjective. Assume | is a CC~ compiler, 
and os € uco(2™°s), ps € uco(2!"¢8), 


# 
If W satisfies ANI‘, then WW} satisfies ANI hp where on and p% are defined as: 
— 
PF =g ods0 f"; př =g opso f" and 
= {s | It engs ot}; g(r) = {t° | Ys. s © t => s © me} 
(and both f* and g° are defined analogously). 


For the example above es recover the definitions we justified intuitively, i.e., on = 
go dso f° = dp and pi= = g° o ps © f* = pr. Moreover, we can prove that if ~ oo 


is surjective, ANT, Pt C Cle o F(ANI® is.) Therefore, the derived guarantee ANI” ož is 
at least as strong as an one that follows by just knowing that the compiler | is cc 


Noninterference and Undefined Behavior. As stated above, Theorem 4.1 does not 
apply to several scenarios from §3 such as undefined behavior ($3.1), as in those cases 
the relation ~ is not a total map. Nevertheless, we can still exploit our framework to 
reason about the impact of compilation on noninterference. 

Let us consider ~ = (~,~) where ~ is any total and surjective map from target to 
source inputs (e.g., equality) and ~ is defined as s* ~ t° = s* = t° Vim’ < t*. s*° = 
m° - Goes_wrong. Intuitively, a CC~ compiler guarantees that no interference can be 
observed by a target attacker that cannot exploit undefined behavior to learn private 
information. This intuition can be made formal by the following theorem. 


Theorem 4.2 (Relaxed Compiling ANI). Relax the assumptions of Theorem 4.1 by 
allowing ~ to be any output trace relation. If WW satisfies ANI, then WJ, satisfies 


PA 
ANI" where on is defined as in Theorem 4.1, and pe is such that: 
T 


Vst.st<t* => p(t") = ph Eh. 


Technically, instead of giving us a definition of pe , the theorem gives a property of it. 
The property states that, given a target output trace t°, the attacker cannot distinguish it 
from any other target output traces produced by other possible compilations (7°) of the 
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source trace s it relates to, up to the observational power of the source level attacker ps. 
Therefore, given a source attacker ps, the theorem characterizes a family of attackers 
that cannot observe any interference for a correctly compiled noninterfering program. 
Notice that the target attacker pe = _. T satisfies the premise of the theorem, but 


# 
defines a trivial hyperproperty, so that we cannot prove in general that ANI of C Cleo 
T 


7(ANI{:). The same a = A_. T shows that the family of attackers described in 
Theorem 4.2 is nonempty, and this ensures the existence of a most powerful attacker 
among them [21], whose explicit characterization we leave for future work. 


From Target NI to Source NI. We now explore the dual question: under what hy- 
potheses does trace-relating compiler correctness alone allow target noninterference to 
be reduced to source noninterference? This is of practical interest, as one would be able 
to protect from target attackers by ensuring noninterference in the source. This task can 
be made easier if the source language has some static enforcement mechanism [1, 36]. 

Let us consider the languages from §3.4 extended with inputting of (pairs of) values. 
It is easy to show that the compiler described in §3.4 is still CC’. Assume that we want 
to satisfy a given notion of target noninterference after compilation, i.e., WJ/—_ ANI pal 
Recall that the observational power of the target attacker, py, is expressed as a property 
of sequences of values. To express the same property (or attacker) in the source, we 
have to abstract the way pairs of values are nested. For instance, the source attacker 
should not distinguish (v1, (v2,v3)) and ((v1, v2), v3). In general (i.e., when ~ is not 
the identity), this argument is valid only when @- can be represented in the source. 
More precisely, y must consider as equivalent all target inputs that are related to the 
same source one, because in the source it is not possible to have a finer distinction of 
inputs. This intuitive correspondence can be formalized as follows: 


Theorem 4.3 (Target ANI by source ANI). Let oy € uco(2T*®°°r), pp € uco(2T™°e°r) 
and ~ a total and surjective map from source outputs to target ones and assume that 


Vst. S © t° > b(t’) = dp (7°(s°)). 
# 
If -| is a CC™ compiler and W satisfies ANI% then \W satisfies ANIE for 
S 


Ë =6 o prof; pË =F o ppo. 


To wrap up the discussion about noninterference, the results presented in this section 
formalize and generalize some intuitive facts about compiler correctness and noninter- 
ference. Of course, they all place some restrictions on the shape of the noninterference 
instances that can be considered, because compiler correctness alone is in general not a 
strong enough criterion for dealing with many security properties [6, 17]. 


5 Trace-Relating Secure Compilation 


So far we have studied compiler correctness criteria for whole, standalone programs. 
However, in practice, programs do not exist in isolation, but in a context where they in- 
teract with other programs, libraries, etc. In many cases, this context cannot be assumed 
to be benign and could instead behave maliciously to try to disrupt a compiled program. 

Hence, in this section we consider the following secure compilation scenario: a 
source program is compiled and linked with an arbitrary target-level context, i.e., one 
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that may not be expressible as the compilation of a source context. Compiler correctness 
does not address this case, as it does not consider arbitrary target contexts, looking 
instead at whole programs (empty context [33]) or well-behaved target contexts that 
behave like source ones (as in compositional compiler correctness [27, 30, 45, 57]). 

To account for this scenario, Abate et al. [2] describe several secure compilation 
criteria based on the preservation of classes of (hyper)properties (e.g., trace properties, 
safety, hypersafety, hyperproperties, etc.) against arbitrary target contexts. For each of 
these criteria, they give an equivalent “property-free” criterion, analogous to the equiv- 
alence between TP and CC~. For instance, their robust trace property preservation cri- 
terion (RTP) states that, for any trace property 7, if a source partial program P plugged 
into any context Cs satisfies 7, then the compiled program P| plugged into any target 
context Cr satisfies 7. Their equivalent criterion to RTP is RTC, which states that for 
any trace produced by the compiled program, when linked with any target context, there 
is a source context that produces the same trace. Formally (writing C [P] to mean the 
whole program that results from linking partial program P with context C’) they define: 

RTP = VP. Va. (VCs. Vt.Cs [P] ot > t € T) > (YCr. Vt. Cr [Pl]~t >t en); 
RTC = VP. YVCr.VYt.Cr [Pl ]~t > ACs. Cs [P]t. 

In the following we adopt the notation P =p r to mean “P robustly satisfies 7,” i.e., P 
satisfies 7 irrespective of the contexts it is linked with. Thus, we write more compactly: 
RTP = Vr. VP. P Era > Pl E pt. 

All the criteria of Abate et al. [2] share this flavor of stating the existence of some 
source context that simulates the behavior of any given target context, with some varia- 
tions depending on the class of (hyper)properties under consideration. All these criteria 
are stated in a setting where source and target traces are the same. In this section, we ex- 
tend their result to our trace-relating setting, obtaining trintarian views for secure com- 
pilation. Despite the similarities with §2, more challenges show up, in particular when 
considering the robust preservation of proper sub-classes of trace properties. For exam- 
ple, after application of o or 7, a property may not be safety anymore, a crucial point for 
the equivalence with the property-free criterion for safety properties by Abate et al. [2]. 
We solve this by interpreting the class of safety properties as an abstraction of the class 
of all trace properties induced by a closure operator (§5.1). The remaining subsections 
provide example compilation chains satisfying our trace-relating secure compilation 
criteria for trace properties ($5.2) and for safety properties hypersafety (§5.3). 


5.1 Trace-Relating Secure Compilation: A Spectrum of Trinities 


In this subsection we generalize many of the criteria of Abate et al. [2] using the ideas 
of §2. Before discussing how we solve the challenges for classes such as safety and 
hypersafety, we show the simple generalization of RTC to the trace-relating setting 
(RT C™~) and its corresponding trinitarian view (Theorem 5.1): 


Theorem 5.1 (Trinity for Robust Trace Properties ¥ ). For any trace relation ~ and 
induced property mappings 7 and õ, we have: RTP? <> RTCY <=> RTPŽ, where 


RTC™ = VP YCr Vt. Cr [Pl Jt > ICs Is ~ t. Cs [P]s; 
RTP* = VP Vag € 21°. PER rs > PL Ep 7(75); 
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RTP? = VP Vry € 2™acer, PER G(r) > P} ER TT. 


Abate et al. [2] propose many more equivalent pairs of criteria, each preserving different 
classes of (hyper)properties, which we briefly recap now. For trace properties, they also 
have criteria that preserve safety properties plus their version of liveness properties. For 
hyperproperties, they have criteria that preserve hypersafety properties, subset-closed 
hyperproperties, and arbitrary hyperproperties. Finally, they define relational hyper- 
properties, which are relations between the behaviors of multiple programs for express- 
ing, e.g., that a program always runs faster than another. For relational hyperproperties, 
they have criteria that preserve arbitrary relational properties, relational safety proper- 
ties, relational hyperproperties and relational subset-closed hyperproperties. Roughly 
speaking, the security guarantees due to robust preservation of trace properties regard 
only protecting the integrity of the program from the context, the guarantees of hyper- 
properties also regard data confidentiality, and the guarantees of relational hyperprop- 
erties even regard code confidentiality. Naturally, these stronger guarantees are increas- 
ingly harder to enforce and prove. 

While we have lifted the most significant criteria from Abate et al. [2] to our trini- 
tarian view, due to space constraints we provide the formal definitions only for the two 
most interesting criteria. We summarize the generalizations of many other criteria in 
Figure 2, described at the end. Omitted definitions are available in the online appendix. 


Beyond Trace Properties: Robust Safety and Hyperproperty Preservation. We 
detail robust preservation of safety properties and of arbitrary hyperproperties since they 
are both relevant from a security point of view and their generalization is interesting. 


Theorem 5.2 (Trinity for Robust Safety Properties #). For any trace relation ~ 
and for the induced property mappings T and ©, we have: 


REPO <a RSC” <> RSPŽ, where 
RSC~ = VP YCr Vt Vm < t.Cr [Pl]~t > ACs St’ > m ds ~ t’. Cs [P] ~s; 
RTP? = YPyrs € 21°. P LR 15 > P} ER (Safe o F)(m5); 
RSP? = VPVrr € Safetyr.P Er (rr) > PL ER TT. 


There is an interesting asymmetry between the last two characterizations above, which 
we explain now in more detail. RSP? quantifies over target safety properties, while 
RTP°2/¢°? quantifies over arbitrary source properties, but imposes the composition of 
T with Safe, which maps an arbitrary target property 7p to the target safety property 
that best over-approximates 77° (an analogous closure was needed for subset-closed 
hyperproperties in Theorem 2.11). More precisely, Safe is a closure operator on target 
properties, with Safety = { Safe(mr) | mr € 27#°eT }. The mappings 
Safe oF : 2's S Safety : õ 

determine a Galois connection between source trace properties and target safety prop- 
erties, and ensure the equivalence RTPS*/°°? <> RSP® («œ ). This argument gen- 
eralizes to arbitrary closure operators on target properties (# ) and on hyperproperties, 
as long as the corresponding class is a sub-class of subset-closed hyperproperties, and 


8 Safe(wr) =N{Sr | nr C Sr A Sr E Safetyr} is the topological closure in the topol- 
ogy of Clarkson and Schneider [14], where safety properties coincide with the closed sets. 


Trace-Relating Compiler Correctness and Secure Compilation 21 


explains all but one of the asymmetries in Figure 2, the one that concerns the robust 
preservation of arbitrary hyperproperties: 


Theorem 5.3 (Weak Trinity for Robust Hyperproperties œ). For a trace relation 
~ C Traces x Tracer and induced property mappings o and 7, RHC™ is equivalent 
to RHP*; moreover, if T S oa is a Galois insertion (i.e., T o © = id), RHC~ implies 
RHPŽ, while if ¢ ST is a Galois reflection (i.e., © 0 T = id), RH pe implies RHC~, 
where RHC™ = VP YCr ACs Vt. Cr [PJ] =t — (ds ~ t. Cs [P]~s); 
RHP? = VP VHs. P Er Hs = P} |p 7(Hs); 
RHP? = VP YHr. P Er (Hr) > PL Ep Hr. 


This trinity is weak since extra hypotheses are needed to prove some implications. 
While the equivalence RHCY <= RHP” holds unconditionally, the other two im- 
plications hold only under distinct, stronger assumptions. For RHP? it is still possible 
and correct to deduce a source obligation for a given target hyperproperty Hr when no 
information is lost in the the composition 7 o & (i.e., the two maps are a Galois inser- 
tion). On the other hand, RHP? is a consequence of RHP? when no information is lost 
in composing in the other direction, & o 7 (i.e., the two maps are a Galois reflection). 


Navigating the Diagram. For a given trace relation ~, Figure 2 orders the generalized 
criteria according to their relative strength. If a trinity implies another (denoted by =>), 
then the former provides stronger security for a compilation chain than the latter. 

As mentioned, some property-full criteria regarding proper subclasses (i.e., subset- 
closed hyperproperties, safety, hypersafety, 2-relational safety and 2-relational hyper- 
properties) quantify over arbitrary (relational) (hyper)properties and compose 7 with 
an additional operator. We have already presented the Safe operator; other operators 
are Clc, HSafe, and 2rSafe, which approximate the image of 7 with a subset-closed 
hyperproperty, a hypersafety and 2-relational safety respectively. 

As a reading aid, when quantifying over arbitrary trace properties we use the shaded 
blue as background color, we use the red when quantifying over arbitrary subset-closed 
hyperproperties and green for arbitrary 2-relational properties. 

We now describe how to interpret the acronyms in Figure 2. All criteria start with R 
meaning they refer to robust preservation. Criteria for relational hyperproperties—here 
only arity 2 is shown—contain 2r. Next, criteria names spell the class of hyperproperties 
they preserve: H for hyperproperties, SCH for subset-closed hyperproperties, HS for 
hypersafety, T for trace properties, and S for safety properties. Finally, property-free 
criteria end with a C while property-full ones involving o and 7 end with P. Thus, 
robust (R) subset-closed hyperproperty-preserving (SCH) compilation (C) is RSCHC~, 
robust (R) two-relational (2r) safety-preserving (S) compilation (C) is R2rSC™, etc. 


5.2 Instance of Trace-Relating Robust Preservation of Trace Properties 


This subsection illustrates trace-relating secure compilation when the target language 
has strictly more events than the source that target contexts can exploit to break security. 


Source and Target Languages. The source and target languages used here are nearly 


identical expression languages, borrowing from the syntax of the source language of 
§3.3. Both languages add sequencing of expressions, two kinds of output events, and 
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Fig. 2: Hierarchy of trinitarian views of secure compilation criteria preserving classes 
of hyperproperties and the key to read each acronym. Shorthands ‘Ins.’ and ‘Refl? stand 
for Galois Insertion and Reflection. The # symbol denotes trinities proven in Coq. 


the expressions that generate them: outs n and outs n usable in source and target, re- 
spectively, and out; n usable only in the target, which is the only difference between 
source and target. The extra events in the target model the fact that the target language 
has an increased ability to perform certain operations, some of them potentially dan- 
gerous (such as writing to the hard drive), which cannot be performed by the source 
language, and against which source-level reasoning can therefore offer no protection. 


Both languages and compilation chains now deal with partial programs, contexts 
and linking of those two to produce whole programs. In this setting, a whole program 
is the combination of a main expression to be evaluated and a set of function definitions 
(with distinct names) that can refer to their argument symbolically and can be called by 
the main expression and by other functions. The set of functions of a whole program 
is the union of the functions of a partial program and a context; the latter also contains 
the main expression. The extensions of the typing rules and the operational semantics 
for whole programs are unsurprising and therefore elided. The trace model also follows 
closely that of §3.3: it consists of a list of regular events (including the new outputs) 
terminated by a result event. Finally, a partial program and a context can be linked into 
a whole program when their functions satisfy the requirements mentioned above. 


Relating Traces. In the present model, source and target traces differ only in the fact 
that the target draws (regular) events from a strictly larger set than the source, i.e., 
Xr D Xs. A natural relation between source and target traces essentially maps to a 
given target trace t the source trace that erases from t those events that exist only at the 
target level. Let t|z, indicate trace t filtered to retain only those elements included in 
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alphabet 2s. We define the trace relation as: 
svt = s=tly. 
In the opposite direction, a source trace s is related to many target ones, as any target- 
only events can be inserted at any point in s. The induced mappings for ~ are: 
T(ms) = {t | Is.s = t|z, As E€ ts}; o(rr)={s | Yt.s = t|z; >t E TT}. 

That is, the target guarantee of a source property is that the target has the same 
source-level behavior, sprinkled with arbitrary target-level behavior. Conversely, the 
source-level obligation of a target property is the aggregate of those source traces all of 
whose target-level enrichments are in the target property. 

Since R and R™ are very similar, it is simple to prove that the identity compiler 
(-|) from RS to RT is secure according to the trace relation ~ defined above. 


Theorem 5.4 (-| is Secure #). -| is RTC~. 


5.3 Instances of Trace-Relating Robust Preservation of Safety and Hypersafety 


To provide examples of cross-language trace-relations that preserve safety and hyper- 
safety properties, we show how existing secure compilation results can be interpreted in 
our framework. This indicates how the more general theory developed here can already 
be instantiated to encompass existing results, and that existing proof techniques can be 
used in order to achieve the secure compilation criteria we define. 

For the preservation of safety, Patrignani and Garg [50] study a compiler from a 
typed, concurrent WHILE language to an untyped, concurrent WHILE language with 
support for memory capabilities. As in §3.3, their source has bools and nats while 
their target only has nats. Additionally, their source has an ML-like memory (where 
the domain is locations /) while their target has an assembly-like memory (where the 
domain is natural numbers n). Their traces consider context-program interactions and 
as such they are concatenations of call and return actions with parameters, which can 
include booleans as well as locations. Because of the aforementioned differences, they 
need a cross-language relation to relate source and target actions. 

Besides defining a relation on traces (i.e., an instance of ~), they also define a 
relation between source and target safety properties. They provide an instantiation of T 
that maps all safe source traces to the related target ones. This ensures that no additional 
target trace is introduced in the target property, and source safety properties are mapped 
to target safety ones by 7. Their compiler is then proven to generate code that respects 
T, so they achieve a variation of RTP°*/¢°*, 


Concerning the preservation of hypersafety, Patrignani and Garg [49] consider com- 
pilers in a reactive setting where traces are sequences of input (a?) and output (a!) ac- 
tions. In their setting, traces are different between source and target, so they define a 
cross-language relation on actions that is total on the source actions and injective. Ad- 
ditionally, their set of target output actions is strictly larger than the source one, as it 
includes a special action ,/, which is how compiled code must respond to invalid target 
inputs (i.e., receiving a bool when a nat was expected). Starting from the relation on 
actions, they define TP C, which is an instance of what we call 7. Informally, given a set 
of source traces, TPC generates all target traces that are related (pointwise) to a source 
trace. Additionally, it generates all traces with interleavings of undesired inputs a? fol- 
lowed by ą/ as long as removing a:?,/ leaves a trace that relates to the source trace. 
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TPC preserves hypersafety across languages, i.e., it is an instance of RSCHP#S¢feo* 
mapping source hypersafety to target hypersafety (and safety to safety). 


6 Related Work 


We already discussed how our results relate to some existing work in correct compila- 
tion [33, 58] and secure compilation [2, 49, 50]. We also already mentioned that most 
of our definitions and results make no assumptions about the structure of traces. One 
result that relies on the structure of traces is Theorem 5.2, which involves some finite 
prefix m, suggesting traces should be some sort of sequences of events (or states), as 
customary when one wants to refer to safety properties [14]. It is however sufficient 
to fix a topology on properties where safety properties coincide with closed sets [46]. 
Even for reasoning about safety, hypersafety, or arbitrary hyperproperties, traces can 
therefore be values, sequences of program states, or of input output events, or even the 
recently proposed interaction trees [62]. In the latter case we believe that the compila- 
tion from IMP to ASM proposed by Xia et al. [62] can be seen as an instance of HC~, 
for the relation they call “trace equivalence.” 


Compilers Where Our Work Could Be Useful. Our work should be broadly applica- 
ble to understanding the guarantees provided by many verified compilers. For instance, 
Wang et al. [61] recently proposed a CompCert variant that compiles all the way down 
to machine code, and it would be interesting to see if the model at the end of §3.1 applies 
there too. This and many other verified compilers [12, 29, 42, 56] beyond CakeML [58] 
deal with resource exhaustion and it would be interesting to also apply the ideas of $3.2 
to them. Hur and Dreyer [27] devised a correct compiler from an ML language to as- 
sembly using a cross-language logical relation to state their CC theorem. They do not 
have traces, though were one to add them, the logical relation on values would serve as 
the basis for the trace relation and therefore their result would attain CC~. 

Switching to more informative traces capturing the interaction between the program 
and the context is often used as a proof technique for secure compilation [2, 28, 48]. 
Most of these results consider a cross-language relation, so they probably could be 
proved to attain one of the criteria from Figure 2. 


Generalizations of Compiler Correctness. The compiler correctness definition of 
Morris [41] was already general enough to account for trace relations, since it consid- 
ered a translation between the semantics of the source program and that of the compiled 
program, which he called “decode” in his diagram, reproduced in Figure 3 (left). And 
even some of the more recent compiler correctness definitions preserve this kind of flex- 
ibility [51]. While CC~ can be seen as an instance of a definition by Morris [41], we are 
not aware of any prior work that investigated the preservation of properties when the 
“decode translation” is neither the identity nor a bijection, and source properties need 
to be re-interpreted as target ones and vice versa. 


Correct Compilation and Galois Connections. Melton et al. [38] and Sabry and 
Wadler [55] expressed a strong variant of compiler correctness using the diagram of 
Figure 3 (right) [38, 55]. They require that compiled programs parallel the computation 
steps of the original source programs, which can be proven showing the existence of a 
decompilation map # that makes the diagram commute, or equivalently, the existence 
of an adjoint for | (W < W’ <= W — W’ for both source and target). The 
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source semantics 


source language > source meanings s 
W ——> Z# 
compile |aecode l D T 
target semantics W ——» Z 


target language > target meanings 
Fig. 3: Morris’s [41] (left) and Melton et al.’s [38] and Sabry and Wadler’s [55] (right) 


“parallel” intuition can be formalized as an instance of CC~. Take source and target 
traces to be finite or infinite sequences of program states (maximal trace semantics 
[15]), and relate them exactly like Melton et al. [38] and Sabry and Wadler [55]. 


Translation Validation. Translation validation is an important alternative to proving 
that all runs of a compiler are correct. A variant of CC~ for translation validation can 
simply be obtained by specializing the definition to a particular W, and one can obtain 
again the same trinitarian view. Similarly for our other criteria, including our extensions 
of the secure compilation criteria of Abate et al. [2], which Busi et al. [10] seem to 
already be considering in the context of translation validation. 


7 Conclusion and Future Work 


We have extended the property preservation view on compiler correctness to arbitrary 
trace relations, and believe that this will be useful for understanding the guarantees var- 
ious compilers provide. An open question is whether, given a compiler, there exists a 
most precise ~ relation for which this compiler is correct. As mentioned in §1, every 
compiler is CC~ for some ~, but under which conditions is there a most precise rela- 
tion? In practice, more precision may not always be better though, as it may be at odds 
with compiler efficiency and may not align with more subjective notions of usefulness, 
leading to tradeoffs in the selection of suitable relations. Finally, another interesting 
direction for future work is studying whether using the relation to Galois connections 
allows to more easily compose trace relations for different purposes, say, for a compiler 
whose target language has undefined behavior, resource exhaustion, and side-channels. 
In particular, are there ways to obtain complex relations by combining simpler ones in 
a way that eases the compiler verification burden? 
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Abstract. Runners of algebraic effects, also known as comodels, pro- 
vide a mathematical model of resource management. We show that they 
also give rise to a programming concept that models top-level external 
resources, as well as allows programmers to modularly define their own 
intermediate “virtual machines”. We capture the core ideas of program- 
ming with runners in an equational calculus Acoop, which we equip with 
a sound and coherent denotational semantics that guarantees the lin- 
ear use of resources and execution of finalisation code. We accompany 
Acoop With examples of runners in action, provide a prototype language 
implementation in OCAML, as well as a HASKELL library based on Acoop.- 


Keywords: Runners, comodels, algebraic effects, resources, finalisation. 


1 Introduction 


Computational effects, such as exceptions, input-output, state, nondeterminism, 
and randomness, are an important component of general-purpose programming 
languages, whether they adopt functional, imperative, object-oriented, or other 
programming paradigms. Even pure languages exhibit computational effects at 
the top level, so to speak, by interacting with their external environment. 

In modern languages, computational effects are often structured using mon- 
ads [22,23,36], or algebraic effects and handlers [12,28,30]. These mechanisms 
excel at implementation of computational effects within the language itself. For 
instance, the familiar implementation of mutable state in terms of state-passing 
functions requires no native state, and can be implemented either as a monad or 
using handlers. One is naturally drawn to using these techniques also for deal- 
ing with actual effects, such as manipulation of native memory and access to 
hardware. These are represented inside the language as algebraic operations (as 
in EFF [4]) or a monad (in the style of HASKELL’s IO), but treated specially by 
the language’s top-level runtime, which invokes corresponding operating system 
functionality. While this approach works in practice, it has some unfortunate 
downsides too, namely lack of modularity and linearity, and excessive generality. 

Lack of modularity is caused by having the external resources hard-coded into 
the top-level runtime. As a result, changing which resources are available and 
how they are implemented requires modifications of the language implementa- 
tion. Additional complications arise when a language supports several operating 
systems and hardware platforms, each providing their own, different feature set. 
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One wishes that the ingenuity of the language implementors were better sup- 
ported by a more flexible methodology with a sound theoretical footing. 
Excessive generality is not as easily discerned, because generality of program- 
ming concepts makes a language expressive and useful, such as general algebraic 
effects and handlers enabling one to implement timeouts, rollbacks, stream redi- 
rection [30], async & await [16], and concurrency [9]. However, the flip side of such 
expressive freedom is the lack of any guarantees about how external resources 
will actually be used. For instance, consider a simple piece of code, written in 
EFF-like syntax, which first opens a file, then writes to it, and finally closes it: 


let fh = open "hello.txt" in write (fh, "Hello, world."); close fh 


What this program actually does depends on how the operations open, write, 
and close are handled. For all we know, an enveloping handler may intercept the 
write operation and discard its continuation, so that close never happens and 
the file is not properly closed. Telling the programmer not to shoot themselves 
in the foot by avoiding such handlers is not helpful, because the handler may 
encounter an external reason for not being able to continue, say a full disk. 

Even worse, external resources may be misused accidentally when we combine 
two handlers, each of which works as intended on its own. For example, if we 
combine the above code with a non-deterministic choose operation, as in 


let fh = open "greeting.txt" in 
let b = choose () in 
if b then write (fh, "hello") else write (fh, "good bye") ; close fh 


and handle it with the standard non-determinism handler 


handler { return x — [x], choose () k > return (append (k true) (k false)) } 


The resulting program attempts to close the file twice, as well as write to it twice, 
because the continuation k is invoked twice when handling choose. Of course, 
with enough care all such situations can be dealt with, but that is beside the 
point. It is worth sacrificing some amount of the generality of algebraic effects 
and monads in exchange for predictable and safe usage of external computational 
effects, so long as the vast majority of common use cases are accommodated. 


Contributions We address the described issues by showing how to design a 
programming language based on runners of algebraic effects. We review runners 
in §2 and recast them as a programming construct in §3. In §4, we present Acoop, 
a calculus that captures the core ideas of programming with runners. We provide 
a coherent and sound denotational semantics for Acoop in §5, where we also prove 
that well-typed code is properly finalised. In §6, we show examples of runners in 
action. The paper is accompanied by a prototype language COOP and a HASKELL 
library HASKELL-Coop, based on Acoop, see 87. The relationship between Acoop 
and existing work is addressed in §8, and future possibilities discussed in §9. 
The paper is also accompanied by an online appendix (https: //arxiv.org/ 
abs/1910.11629) that provides the typing and equational rules we omit in §4. 
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Runners are modular in that they can be used not only to model the top- 
level interaction with the external environment, but programmers can also use 
them to define and nest their own intermediate “virtual machines”. Our runners 
are effectful: they may handle operations by calling further outer operations, 
and raise exceptions and send signals, through which exceptional conditions and 
runtime errors are communicated back to user programs in a safe fashion that 
preserves linear usage of external resources and ensures their proper finalisation. 

We achieve suitable generality for handling of external resources by showing 
how runners provide implementations of algebraic operations together with a 
natural notion of finalisation, and a strong guarantee that in the absence of 
external kill signals the finalisation code is executed exactly once (Thm. 7). We 
argue that for most purposes such discipline is well worth having, and giving up 
the arbitrariness of effect handlers is an acceptable price to pay. In fact, as will 
be apparent in the denotational semantics, runners are simply a restricted form 
of handlers, which apply the continuation at most once in a tail call position. 

Runners guarantee linear usage of resources not through a linear or unique- 
ness type system (such as in the CLEAN programming language [15]) or a syntac- 
tic discipline governing the application of continuations in handlers, but rather 
by a design based on the linear state-passing technique studied by Mggelberg 
and Staton [21]. In this approach, a computational resource may be implemented 
without restrictions, but is then guaranteed to be used linearly by user code. 


2 Algebraic effects, handlers, and runners 


We begin with a short overview of the theory of algebraic effects and handlers, 
as well as runners. To keep focus on how runners give rise to a programming 
concept, we work naively in set theory. Nevertheless, we use category-theoretic 
language as appropriate, to make it clear that there are no essential obstacles to 
extending our work to other settings (we return to this point in §5.1). 


2.1 Algebraic effects and handlers 


There is by now no lack of material on the algebraic approach to structuring 
computational effects. For an introductory treatment we refer to [5], while of 
course also recommend the seminal papers by Plotkin and Power [25,28]. The 
brief summary given here only recalls the essentials and introduces notation. 

An (algebraic) signature is given by a set X of operation symbols, and for each 
op € X its operation signature op : App ~ Bop, where Aop and Bop are called the 
parameter and arity set. A 3’-structure M is given by a carrier set |M|, and 
for each operation symbol op € X, a map opm : Aop X (Bop = |M|) > |M], 
where = is set exponentiation. The free X-structure Trees (X) over a set X is 
the set of well-founded trees generated inductively by 


— return x € Trees (X), for every x € X, and 


— op(a, K) € Trees (X), for every op € X, a E€ Aop, and «K : Bop > Trees (X). 


op? 
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We are abusing notation in a slight but standard way, by using op both as the 
name of an operation and a tree-forming constructor. The elements of Trees (X) 
are called computation trees: a leaf return x represents a pure computation re- 
turning a value x, while op(a, x) represents an effectful computation that calls 
op with parameter a and continuation «x, which expects a result from Bop. 

An algebraic theory T = (Xr, Eq7) is given by a signature Xr and a set of 
equations Eqy. The equations Eq; express computational behaviour via inter- 
actions between operations, and are written in a suitable formalism, e.g., [30]. 
We explain these by way of examples, as the precise details do not matter for 
our purposes. Let 0 = {} be the empty set and 1 = {x} the standard singleton. 


Example 1. Given a set C of possible states, the theory of C-valued state has 
two operations, whose somewhat unusual naming will become clear later on, 


getenv: 1 ~ C, setenv: C ~ 1 
and the equations (where we elide appearances of x): 


getenv(Ac. setenv(c, K)) = «K, setenv(c, getenv K) = setenv(c, Kc), 


setenv(c, setenv(c’, K)) = setenv(c', K). 


For example, the second equation states that reading state right after setting it 
to c gives precisely c. The third equation states that setenv overwrites the state. 


Example 2. Given a set of exceptions E, the algebraic theory of E-many excep- 
tions is given by a single operation raise : Æ ~~ 0, and no equations. 


A T-model, also called a T-algebra, is a Xy-structure which satisfies the 
equations in Eqy. The free T-model over a set X is constructed as the quotient 


Freez (X) = Trees, (X) /~ 


by the 2'7-congruence ~ generated by Eq7. Each op € X7 is interpreted in the 
free model as the map (a, K) +> [op(a, «)], where [—] is the ~-equivalence class. 
Freez (—) is the functor part of a monad on sets, whose unit at a set X is 


Koun Trees, (X) mae Freez (X). 


The Kleisli extension for this monad is then the operation which lifts any map 
f: X — Trees, (Y) to the map ft : Frees, (X) > Frees, (Y), given by 


ft [return a] & f a, f* fop(a,n)] % [op(a, ft ox)]. 


That is, f traverses a computation tree and replaces each leaf return x with f x. 

The preceding construction of free models and the monad may be retro- 
fitted to an algebraic signature X, if we construe X as an algebraic theory with 
no equations. In this case ~ is just equality, and so we may omit the quotient 
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and the pesky equivalence classes. Thus the carrier of the free X-model is the 
set of well-founded trees Trees (X), with the evident monad structure. 

A fundamental insight of Plotkin and Power [25,28] was that many com- 
putational effects may be adequately described by algebraic theories, with the 
elements of free models corresponding to effectful computations. For example, 
the monads induced by the theories from Examples 1 and 2 are respectively 
isomorphic to the usual state monad Sto X = (C = X x C) and the exceptions 


monad Exc X SX+E. 

Plotkin and Pretnar [30] further observed that the universal property of free 
models may be used to model a programming concept known as handlers. Given 
a T-model M and a map f : X — |M], the universal property of the free 


T-model gives us a unique T-homomorphism ft : Freer (X) > |M] satisfying 


f? [return z] = f x, f$ [op(a, &)] = opu (a, fi on). 


A handler for a theory 7 in a language such as EFF amounts to a model M 
whose carrier |M] is the carrier Freez (Y) of the free model for some other the- 
ory T’, while the associated handling construct is the induced T-homomorphism 
Freez (X) — Freer (Y). Thus handling transforms computations with effects T 
to computations with effects J’. There is however no restriction on how a han- 
dler implements an operation, in particular, it may use its continuation in an 
arbitrary fashion. We shall put the universal property of free models to good use 
as well, while making sure that the continuations are always used affinely. 


2.2 Runners 


Much like monads, handlers are useful for simulating computational effects, be- 
cause they allow us to transform 7-computations to 7’-computations. However, 
eventually there has to be a “top level” where such transformations cease and 
actual computational effects happen. For these we need another concept, known 
as runners [35]. Runners are equivalent to the concept of comodels [27,31], which 
are “just models in the opposite category”, although one has to apply the motto 
correctly by using powers and co-powers where seemingly exponentials and prod- 
ucts would do. Without getting into the intricacies, let us spell out the definition. 


Definition 1. A runner R for a signature X is given by a carrier set |R| together 
with, for each op € X, a co-operation Opr : Aop > (R| = Bop x |RI). 


Runners are usually defined to have co-operations in the equivalent uncurried 
form Opp : Aop X |R| > Bop x [R], but that is less convenient for our purposes. 
Runners may be defined more generally for theories 7, rather than just sig- 
natures, by requiring that the co-operations satisfy Eqy. We shall have no use 
for these, although we expect no obstacles in incorporating them into our work. 
A runner tells us what to do when an effectful computation reaches the 
top-level runtime environment. Think of |R| as the set of configurations of 
the runtime environment. Given the current configuration c € |R|, the opera- 
tion op(a, s) is executed as the corresponding co-operation Opp ac whose result 
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(b,c’) € Bop x [R| gives the result of the operation b and the next runtime 
configuration c’. The continuation « b then proceeds in runtime configuration c. 
It is not too difficult to turn this idea into a mathematical model. For any 

def 


X, the co-operations induce a ¥-structure M with |M| = Stir X = (R| = 
X x |R|) and operations opy, : Aop X (Bop = Stir) X) > Stir X given by 


opu (a, 1) % Ac. (m (Bp ae) (12(0r a0). 


We may then use the universal property of the free X-model to obtain a X- 
homomorphism rx : Trees (X) — St)p)X satisfying the equations 


rx (return x) = Ac. (x, c), rx (op(a,&)) = opay(a, rx OK). 


The map rx precisely captures the idea that a runner runs computations by 
transforming (static) computation trees into state-passing maps. Note how in 
the above definition of opm, the continuation « is used in a controlled way, as 
it appears precisely once as the head of the outermost application. In terms of 
programming, this corresponds to linear use in a tail-call position. 

Runners are less ad-hoc than they may seem. First, notice that op,, is just the 
composition of the co-operation OPR with the state monad’s Kleisli extension of 
the continuation «, and so is the standard way of turning generic effects into X- 
structures [26]. Second, the map rx is the component at X of a monad morphism 
r: Trees (—) — Stir}. Mogelberg & Staton [21], as well as Uustalu [35], showed 
that the passage from a runner 7 to the corresponding monad morphism r forms 
a one-to-one correspondence between the former and the latter. 

As defined, runners are too restrictive a model of top-level computation, 
because the only effect available to co-operations is state, but in practice the 
runtime environment may also signal errors and perform other effects, by calling 
its own runtime environment. We are led to the following generalisation. 


Definition 2. For a signature X and monad T, a T-runner R for X, or just an 
effectful runner, is given by, for each op € X, a co-operation Opr : Aop > T Bop. 


The correspondence between runners and monad morphisms still holds. 


Proposition 3. For a signature X and a monad T, the monad morphisms 
Trees (—) > T are in one-to-one correspondence with T-runners for X. 


Proof. This is an easy generalisation of the correspondence for ordinary runners. 
Let us fix a signature X, and a monad T with unit 7 and Kleisli extension —'. 

Let R be a T-runner for X. For any set X, R induces a X-structure M 
with |M| = TX and opm : Aop x (Bop > TX) > TX defined as op yla, K) = 
«(Opp a). As before, the universal property of the free model Trees (X) provides 
a unique Y-homomorphism rx : Trees (X) > TX, satisfying the equations 


rx (return x) = nx (x), rx (op(a,&)) = opyy(a, rx o K). 
The maps rx collectively give us the desired monad morphism r induced by R. 
Conversely, given a monad morphism 0 : Trees (—) > T, we may recover a T- 


runner R for X by defining the co-operations as opg a = 9B,, (op(a, Ab. return b)). 
It is not hard to check that we have described a one-to-one correspondence. 
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3 Programming with runners 


If ordinary runners are not general enough, the effectful ones are too general: 
parameterised by arbitrary monads T, they do not combine easily and they lack 
a clear notion of resource management. Thus, we now engineer more specific 
monads whose associated runners can be turned into a programming concept. 
While we give up complete generality, the monads presented below are still quite 
versatile, as they are parameterised by arbitrary algebraic signatures X, and so 
are extensible and support various combinations of effects. 


3.1 The user and kernel monads 


Effectful source code running inside a runtime environment is just one example 
of a more general phenomenon in which effectful computations are enveloped by 
a layer that provides a supervised access to external resources: a user process 
is controlled by a kernel, a web page by a browser, an operating system by 
hardware, or a virtual machine, etc. We shall adopt the parlance of software 
systems, and refer to the two layers generically as the user and kernel code. 
Since the two kinds of code need not, and will not, use the same effects, each 
will be described by its own algebraic theory and compute in its own monad. 
We first address the kernel theory. Specifically, we look for an algebraic theory 
such that effectful runners for the induced monad satisfy the following desiderata: 


1. Runners support management and controlled finalisation of resources. 
2. Runners may use further external resources. 
3. Runners may signal failure caused by unavoidable circumstances. 


The totality of external resources available to user code appears as a stateful 
external environment, even though it has no direct access to it. Thus, kernel 
computations should carry state. We achieve this by incorporating into the kernel 
theory the operations getenv and setenv, and equations for state from Example 1. 

Apart from managing state, kernel code should have access to further effects, 
which may be true external effects, or some outer layer of runners. In either case, 
we should allow the kernel code to call operations from a given signature X. 

Because kernel computations ought to be able to signal failure, we should 
include an exception mechanism. In practice, many programming languages and 
systems have two flavours of exceptions, variously called recoverable and fatal, 
checked and unchecked, exceptions and errors, etc. One kind, which we call just 
exceptions, is raised by kernel code when a situation requires special attention 
by user code. The other kind, which we call signals, indicates an unrecoverable 
condition that prevents normal execution of user code. These correspond pre- 
cisely to the two standard ways of combining exceptions with state, namely the 
coproduct and the tensor of algebraic theories [11]. The coproduct simply adjoins 
exceptions raise : Æ ~~ 0 from Example 2 to the theory of state, while the tensor 
extends the theory of state with signals kill : S ~ 0, together with equations 


getenv(Ac. kill s) = kill s, setenv(c, kill s) = kill s. (1) 
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These equations say that a signal discards state, which makes it unrecoverable. 

To summarise, the kernel theory Ks,,5,c contains operations from a signa- 
ture X, as well as state operations getenv : 1 ~~ C, setenv : C ~ 1, exceptions 
raise : E ~~ O, and signals kill : S ~ 0, with equations for state from Example 1, 
equations (1) relating state and signals, and for each operation op € X, equations 


getenv(Ac. op(a, &c)) = op(a, Ab. getenv(Ac. K cb)), 
setenv(c, op(a, K)) = op(a, Ab. setenv(c, K b)), 


expressing that external operations do not interact with kernel state. It is not 
difficult to see that Ks g,s,c induces, up to isomorphism, the kernel monad 


KomscoX © C= Trees (((X + E) x C)+ 85S). 


How about user code? It can of course call operations from a signature X 
(not necessarily the same as the kernel code), and because we intend it to handle 
exceptions, it might as well have the ability to raise them. However, user code 
knows nothing about signals and kernel state. Thus, we choose the user theory 
Us, n to be the algebraic theory with operations X, exceptions raise : Æ ~> 0, and 


no equations. This theory induces the user monad Us œX = Tree s(X + E). 


3.2 Runners as a programming construct 


In this section, we turn the ideas presented so far into programming constructs. 
We strive for a realistic result, but when faced with several design options, we 
prefer simplicity and semantic clarity. We focus here on translating the central 
concepts, and postpone various details to §4, where we present a full calculus. 
We codify the idea of user and kernel computations by having syntactic 
categories for each of them, as well as one for values. We use letters M, N to 
indicate user computations, K, L for kernel computations, and V, W for values. 
User and kernel code raise exceptions with operation raise, and catch them 
with exception handlers based on Benton and Kennedy’s exceptional syntax [7], 


try M with {return a +> N,...,raiseer> Nz,...}, 


and analogously for kernel code. The familiar binding construct let = M in N 
is simply shorthand for try M with {return z +> N,...,raisee +> raisee,...}. 
As a programming concept, a runner R takes the form 


{(op T> Kop)opex}c;, 


where each Ko, is a kernel computation, with the variable x bound in Kop, so 

that each clause opz +> Ko, determines a co-operation for the kernel monad. 

The subscript C indicates the type of the state used by the kernel code Kop. 
The corresponding elimination form is a handling-like construct 


using RQV run M finally F, (2) 
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which uses the co-operations of runner R “at” initial kernel state V to run user 
code M, and finalises its return value, exceptions, and signals with F, see (3) 
below. When user code M calls an operation op, the enveloping run construct 
runs the corresponding co-operation Ko, of R. While doing so, Kop might raise 
exceptions. But not every exception makes sense for every operation, and so 
we assign to each operation op a set of exceptions Ho, which the co-operations 
implementing it may raise, by augmenting its operation signature with Epp, as 


op : Aop ~ Bop ! Eop- 


An exception raised by the co-operation Kop propagates back to the operation 
call in the user code. Therefore, an operation call should have not only a contin- 
uation x. M receiving a result, but also continuations Ne, one for each e € Fop, 


op(V, (x. M), (Ne) ec Ey): 


If Kop returns a value b € Bop, the execution proceeds as M[b/x], and as Ne if 
Kop raises an exception e € Epp. In examples, we use the generic versions of op- 
erations [26], written op V, which pass on return values and re-raise exceptions. 

One can pass exceptions back to operation calls also in a language with han- 
dlers, such as EFF, by changing the signatures of operations to Asp ~ Bop + Eop, 
and implementing the exception mechanism by hand, so that every operation call 
is followed by a case distinction on Bop + Hop. One is reminded of how operating 
system calls communicate errors back to user code as exceptional values. 

A co-operation Kop may also send a signal, in which case the rest of the user 
code M is skipped and the control proceeds directly to the corresponding case 
of the finalisation part F of the run construct (2), whose syntactic form is 


{return x Q c > N,...,raisee Q c Ne, ... kill s => Nsg,...}. (3) 


Specifically, if M returns a value v, then N is evaluated with x bound to v and c 
to the final kernel state; if M raises an exception e (either directly or indirectly 
via a co-operation of R), then Ne is executed, again with c bound to the final 


kernel state; and if a co-operation of R sends a signal s, then N, is executed. 


Example 4. In anticipation of setting up the complete calculus we show how one 
can work with files. The language implementors can provide an operation open 
which opens a file for writing and returns its file handle, an operation close which 
closes a file handle, and a runner filelO that implements writing. Let us further 
suppose that filelO may raise an exception QuotaExceeded if a write exceeds the 
user disk quota, and send a signal lOError if an unrecoverable external error 
occurs. The following code illustrates how to guarantee proper closing of the file: 


using filelO @ (open "hello.txt") run 
write "Hello, world." 

finally { 
return x @ fh — close fh, 
raise QuotaExceeded @ fh — close fh, 
kill IOError > return () } 
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Notice that the user code does not have direct access to the file handle. Instead, 
the runner holds it in its state, where it is available to the co-operation that 
implements write. The finalisation block gets access to the file handle upon suc- 
cessful completion and raised exception, so it can close the file, but when a signal 
happens the finalisation cannot close the file, nor should it attempt to do so. 

We also mention that the code “cheats” by placing the call to open in a posi- 
tion where a value is expected. We should have let-bound the file handle returned 
by open outside the run construct, which would make it clear that opening the 
file happens before this construct (and that open is not handled by the finalisa- 
tion), but would also expose the file handle. Since there are clear advantages to 
keeping the file handle inaccessible, a realistic language should accept the above 
code and hoist computations from value positions automatically. 


4 A calculus for programming with runners 


Inspired by the semantic notion of runners and the ideas of the previous section, 
we now present a calculus for programming with co-operations and runners, 
called Acoop. It is a low-level fine-grain call-by-value calculus [19], and as such 
could inspire an intermediate language that a high-level language is compiled to. 


4.1 Types 


The types of Acoop are Shown in Fig. 1. The ground types contain base types, and 
are closed under finite sums and products. These are used in operation signa- 
tures and as types of kernel state. (Allowing arbitrary types in either of these 
entails substantial complications that can be dealt with but are tangential to 
our goals.) Ground types can also come with corresponding constant symbols f, 
each associated with a fixed constant signature f : (A1,..., An) > B. 

We assume a supply of operation symbols O, exception names E, and signal 
names S. Each operation symbol op € O is equipped with an operation signature 
Aop ~œ Bop! Eop, which specifies its parameter type Aop and arity type Bop, and 
the exceptions Fop that the corresponding co-operations may raise in runners. 

The value types extend ground types with two function types, and a type 
of runners. The user function type X — Y ! (7, EF) classifies functions tak- 
ing arguments of type X to computations classified by the user (computa- 
tion) type Y !(X, E), i.e., those that return values of type Y, and may call 
operations X and raise exceptions FE. Similarly, the kernel function type X —> 
Y4(2, E,S,C) classifies functions taking arguments of type X to computations 
classified by the kernel (computation) type Y}(27, E,S,C), i.e., those that return 
values of type Y, and may call operations X, raise exceptions FE, send signals S, 
and use state of type C. We note that the ingredients for user and kernel types 
correspond precisely to the parameters of the user monad Uy and the kernel 
monad Ky 2,5,c from §3.1. Finally, the runner type X = (X', S, C) classifies run- 
ners that implement co-operations for the operations X as kernel computations 
which use operations X’, send signals S, and use state of type C. 
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Ground type A, B, C ::= b base type 
| unit unit type 
| empty empty type 
| AxB product type 
| A+B sum type 


Constant signature: f: (41,..., An) > B 


Signature X ::= {op,,ops,...,op,,} € O 


Exception set E :: 
Signal set S :: 


Operation signature: 


= {e1,€2,...,en} CE 


= {51,52,...,8n} CS 


op: Aop ~~ Bop ! Ep 


Value type X, Y, Z := A 


ground type 


| XxY product type 

| X+Y sum type 

| X>Y!Uu user function type 

| X>Y;K kernel function type 
| X= (X',8,0) runner type 


User (computation) type: X!U where U = (X, E) 
Kernel (computation) type: X4K where K = (X, E, S,C) 


Fig. 1. The types of Acoop. 


4.2 Values and computations 


The syntax of terms is shown in Fig. 2. The usual fine-grain call-by-value strat- 
ification of terms into pure values and effectful computations is present, except 
that we further distinguish between user and kernel computations. 


Values Among the values are variables, constants for ground types, and con- 
structors for sums and products. There are two kinds of functions, for abstracting 
over user and kernel computations. A runner is a value of the form 


{(op T e> Kop)opex}c- 


It implements co-operations for operations op as kernel computations Kop, with 
x bound in Kop. The type annotation C specifies the type of the state that Kop 
uses. Note that C ranges over ground types, a restriction that allows us to define 
a naive set-theoretic semantics. We sometimes omit these type annotations. 


User and kernel computations The user and kernel computations both have 
pure computations, function application, exception raising and handling, stan- 
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Values 
£ 


f(Vi,..-, Vn) 

0) 

(V,W) 

inlxy V | inrx,y V 
fun (x : X) => M 
funk (z: X) => K 
{(op£ > Kop)opes to 


User computations 
return V 


VW 

try M with {return z+ N, (raise e œ> Ne)cen} 
match V with {(z, y) => M} 

match V with {}x 

match V with {inl z => M, inr y > N} 

opx(V, (x . M), (Ne)eeEop) 

raisex e 

using V @ W run M finally F 

kernel K @ W finally F 


variable 

ground constant 
unit 

pair 

injection 

user function 
kernel function 


runner 


value 

application 
exception handler 
product elimination 
empty elimination 
sum elimination 
operation call 

raise exception 
running user code 


switch to kernel mode 


{return z @cH N, (raise e @ c > Ne)ecen, (kill s H Ns)ses} 


Kernel computations 
returne V 


VW 

try K with {return z +> L, (raise e > Le)eex} 
match V with {(2,y) => K} 

match V with {} xec 

match V with {inl z > K, inry > L} 

opx(V, (x. K), (Le)eeEop) 

raisexac e 

killxac s 

getenvc (c. K) 

setenv(V, K) 


user M with {return z +> K, (raise e œ> Le)cer} 


value 

application 
exception handler 
product elimination 
empty elimination 
sum elimination 
operation call 
raise exception 
send signal 

get kernel state 
set kernel state 


switch to user mode 


Fig. 2. Values, user computations, and kernel computations of Acoop. 
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dard elimination forms, and operation calls. Note that the typing annotations 
on some of these differ according to their mode. For instance, a user operation 
call is annotated with the result type X, whereas the annotation X @ C on a 
kernel operation call also specifies the kernel state type C. 

The binding construct letxıg x = M in N is not part of the syntax, but is an 
abbreviation for try M with {return x +> N, (raise e + raisex e)ecg}, and there is 
an analogous one for kernel computations. We often drop the annotation X!E. 

Some computations are specific to one or the other mode. Only the kernel 
mode may send a signal with kill, and manipulate state with getenv and setenv, 
but only the user mode has the run construct from §3.2. Finally, each mode has 
the ability to “context switch” to the other one. The kernel computation 


user M with {return z —> K, (raise e > Le) cen} 


runs a user computation M and handles the returned value and leftover excep- 
tions with kernel computations K and Le. Conversely, the user computation 


kernel K @ W finally {x @ c => M, (raise e @ c œ> Ne)ecen, (kill s > Ng) ses} 


runs kernel computation K with initial state W, and handles the returned value, 
and leftover exceptions and signals with user computations M, Ne, and Ns. 


4.3 Type system 


We equip Acoop With a type system akin to type and effect systems for algebraic 
effects and handlers [3,7,12]. We are experimenting with resource control, so it 
makes sense for the type system to tightly control resources. Consequently, our 
effect system does not allow effects to be implicitly propagated outwards. 

In §4.1, we assumed that each operation op € O is equipped with some fixed 
operation signature op : App ~ Bop ! Hop. We also assumed a fixed constant 
signature f : (A1,..., An) —> B for each ground constant f. We consider this 
information to be part of the type system and say no more about it. 

Values, user computations, and kernel computations each have a correspond- 
ing typing judgement form and a subtyping relation, given by 


TEV:xX, CFEM:X!U, TEK:X4K, 
XEY, XIUCY!Y, XSKEYSL, 
where I" is a typing context x1 : X1,..., £n : Xn. The effect information is an 


over-approximation, i.e., W and K employ at most the effects described by U 
and K. The complete rules for these judgements are given in the online appendix. 
We comment here only on the rules that are peculiar to Acoop, see Fig. 3. 
Subtyping of ground types SUB-GROUND is trivial, as it relates only equal 
types. Subtyping of runners SUB- RUNNER and kernel computations SUB-KERNEL 
requires equality of the kernel state types C and C” because state is used invari- 
antly in the kernel monad. We leave it for future work to replace C = C” with 
a lens [10] from C’ to C, i.e., maps C” > C and C’ x C > C” satisfying state 
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GUBLGROUN SUB-RUNNER 
K 3- ROUND i i 1 j 
Ai Mem Wem ses” C=C 


ACA X > (X2,5,C) E X > (54,5',C’) 


XEX cy’ ESE gcs CHC 
Xi, BS, VEX (SES) 


Ty UsER-TRY 
TEM:X!(2,E) T,z:X H N:Y!(5,E') (DE Ne: ¥'(2, 2’) op 


I | try M with {return z > N, (raise e œ> Ne)eeg} : Y ! (X, E’) 


TYUSER-RUN 
F = {return z @ c > N, (raise e @ c œ Ne)ecen, (kill s œ Ns)ses} 
PFEV:2 = (2',S,C) FEW:C 
CPEM:X\(S,E) T,æ:X,c:CHN:Y!(X',E') 
$ / [A [A 
(L,c:0 H Ne: Y'(ZE)) op (FEN: Y!(2",E)) 2g 
IE using V @ W run M finally F : Y ! (X, E’) 


‘TyUSER-OP 
U= (X, E) ope X DEV: Aop 
[,t: Bop KM: X!U (PFE Ne: X!U) 


Pe opx(V, (x. M), (Ne)eexop) 7X 1U 


eE Eop 


TYKERNEL-OP 
K = (X, E, S,C) ope X TEV: Aop 
Lie: Bop HK: XEK (I H Le: X4K) 


Db opy(V, (x. K), (Le)eex,) : X4 K 


e€ Lop 


Ty UsER- KERNEL 
F = {return z @ c > N, (raise e @cH Ne)eex, (kill s œ Ns) ses} 
TH K:X4(5,E,S,C) DTDEW:C Tæ:X,c:CHN:Y!(X,E') 
(Gert Ne: Y! (5, E')) op (ENG YI E')) s 


I į} kernel K @ W finally F : Y ! (X, E’) 


TYKERNEL-USER 
K=(X,F',S,C) CTEM:X!(2,E) 
Ta: XH K:Y4K (Pe Let VER) oon 


I } user M with {return z +> K, (raise e > Le)een} : Y4 K 


Fig. 3. Selected typing and subtyping rules. 
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equations analogous to Example 1. It has been observed [24,31] that such a lens 
in fact amounts to an ordinary runner for C-valued state. 

The rules TyUSER-OP and TYKERNEL-OP govern operation calls, where we 
have a success continuation which receives a value returned by a co-operation, 
and exceptional continuations which receive exceptions raised by co-operations. 

The rule Ty USER-RUN requires that the runner V implements all the opera- 
tions M can use, meaning that operations are not implicitly propagated outside 
a run block (which is different from how handlers are sometimes implemented). 
Of course, the co-operations of the runner may call further external operations, 
as recorded by the signature X”. Similarly, we require the finally block F to in- 
tercept all exceptions and signals that might be produced by the co-operations 
of V or the user code M. Such strict control is exercised throughout. For ex- 
ample, in TyUSER-RUN, TyUSER-KERNEL, and TYKERNEL-USER we catch all 
the exceptions and signals that the code might produce. One should judiciously 
relax these requirements in a language that is presented to the programmer, and 
allow re-raising and re-sending clauses to be automatically inserted. 


4.4 Equational theory 


We present Acgop as an equational calculus, i.e., the interactions between its 
components are described by equations. Such a presentation makes it easy to 
reason about program equivalence. There are three equality judgements 


FEVeW :X, FEFM=AN:X!U, TER=L:X!K. 


It is presupposed that we only compare well-typed expressions with the indicated 
types. For the most part, the context and the type annotation on judgements 
will play no significant role, and so we shall drop them whenever possible. 

We comment on the computational equations for constructs characteristic 
of Acoop, and refer the reader to the online appendix for other equations. When 
read left-to-right, these equations explain the operational meaning of programs. 

Of the three equations for run, the first two specify that returned values and 
raised exceptions are handled by the corresponding clauses, 


using V @ W run (return V’) finally F = N[V'/z, W/c], 
using V @ W run (raisex e) finally F = N.[W/c], 

where F = {return z@c > N, (raise e@c > Ne)cen, (kill s + Ns)ses}. The third 
equation below relates running an operation op with executing the corresponding 
co-operation Kop, where R stands for the runner {(op x +> Kop)opes}c: 

using R@ W run (opx(V, (x. M), (Ne )eex,,)) finally F = 

kernel Kop[ V /x] Q W finally 
{return z @c! +> (using R@c' run M finally F), 
(raise e’ @ d +> (using R@ c run N; finally F)) 
(kill s > Ns) eg } 


CEE,’ 
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Because Kop is kernel code, it is executed in kernel mode, whose finally clauses 
specify what happens afterwards: if Kop returns a value, or raises an exception, 
execution continues with a suitable continuation, with R wrapped around it; and 
if Kop sends a signal, the corresponding finalisation code from F is evaluated. 
The next bundle describes how kernel code is executed within user code: 


kernel (returng V) @ W finally F = N[V/x, W/c], 
kernel (raisexac e) @ W finally F = N.[W/c], 
kernel (killyac s) @ W finally F = N,, 
kernel (getenvo(c. K)) @ W finally F = kernel K[W/c] @ W finally F, 
kernel (setenv(V, K)) @ W finally F = kernel K @ V finally F. 


We also have an equation stating that an operation called in kernel mode prop- 
agates out to user mode, with its continuations wrapped in kernel mode: 


kernel opx(V, (x . K), (Le Jees) @ W finally F = 
opx(V, (x. kernel K @ W finally F), (kernel Le @ W finally F)yep)- 


Similar equations govern execution of user computations in kernel mode. 

The remaining equations include standard {n-equations for exception han- 
dling [7], deconstruction of products and sums, algebraicity equations for oper- 
ations [33], and the equations of kernel theory from §3.1, describing how getenv 
and setenv work, and how they interact with signals and other operations. 


5 Denotational semantics 


We provide a coherent denotational semantics for Acoop, and prove it sound with 
respect to the equational theory given in §4.4. Having eschewed all forms of 
recursion, we may afford to work simply over the category of sets and functions, 
while noting that there is no obstacle to incorporating recursion at all levels and 
switching to domain theory, similarly to the treatment of effect handlers in [3]. 


5.1 Semantics of types 


The meaning of terms is most naturally defined by structural induction on their 
typing derivations, which however are not unique in Acoop due to subsumption 
rules. Thus we must worry about devising a coherent semantics, i.e., one in which 
all derivations of a judgement get the same meaning. We follow prior work on the 
semantics of effect systems for handlers [3], and proceed by first giving a skeletal 
semantics of Acoop in which derivations are manifestly unique because the effect 
information is unrefined. We then use the skeletal semantics as the frame upon 
which rests a refinement-style coherent semantics of the effectful types of Acoop- 
The skeletal types are like Acoop’s types, but with all effect information erased. 
In particular, the ground types A, and hence the kernel state types C, do not 
change as they contain no effect information. The skeletal value types are 


P,Q == A|unit|empty| Px Q|P+Q|P—>Q!|P—>QsC|runnerC. 
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The skeletal versions of the user and kernel types are P! and P4C, respec- 
tively. It is best to think of the skeletal types as ML-style types which implicitly 
over-approximate effect information by “any effect is possible’, an idea which is 
mathematically expressed by their semantics, as explained below. 

First of all, the semantics of ground types is straightforward. One only needs 
to provide sets denoting the base types b, after which the ground types receive 
the standard set-theoretic meaning, as given in Fig. 4. 

Recall that ©, S, and E are the sets of all operations, signals, and exceptions, 
and that each op € O has a signature op : Aop ~> Bop! Eop. Let us additionally 
assume that there is a distinguished operation £ € O with signature 4 : 1 ~~ 0!0 
(otherwise we adjoin it to O). It ensures that the denotations of skeletal user and 
kernel types are pointed sets, while operationally 4 indicates a runtime error. 

Next, we define the skeletal user and kernel monads as 


def 


UX = Uo X = Treeo (X + E) 3 
KX = Koes oX = (C => Treeo ((X + £) x C+8)), 


and Runner? C as the set of all skeletal runners R (with state C), which are fami- 
lies of co-operations {OPR : [Aop]] > Ko,z.,,8,c [Bop] }opeo. Note that Ko,n,,,5,¢ 
is a coproduct [11] of monads C = Treeo (— x C + S) and Excg,,, and thus the 
skeletal runners are the effectful runners for the former monad, so long as we 
read the effectful signatures op : Aj, ~> Bop ! Eop as ordinary algebraic ones 
op : Agp ~ Bop + Eop. While there is no semantic difference between the two 
readings, there is one of intention: Ko, n,,,s,c|| Bop]] is a kernel computation that 
(apart from using state and sending signals) returns values of type Bop and raises 
exceptions Epp, whereas C = Treeo (([[Bop|] + Eop) x C + S) returns values of 
type Bop + Eop and raises no exceptions. We prefer the former, as it reflects our 
treatment of exceptions as a control mechanism rather than exceptional values. 

These ingredients suffice for the denotation of skeletal types as sets, as given 
in Fig. 4. The user and kernel skeletal types are interpreted using the respective 
skeletal monads, and hence the two function types as Kleisli exponentials. 

We proceed with the semantics of effectful types. The skeleton of a value 
type X is the skeletal type X* obtained by removing all effect information, and 
similarly for user and kernel types, see Fig. 5. We interpret a value type X as a 
subset || X|]  [-X°]] of the denotation of its skeleton, and similarly for user and 
computation types. In other words, we treat the effectful types as refinements 
of their skeletons. For this, we define the operation (Xo, X1) > (Yo, Yı), for any 
Xo © X; and Yo € Yj, as the set of maps X; — Yj restricted to Xo — Yo: 


(Xo, X1) > (%, Y1) = {f : X1 > Yı | Yz € Xo. f(x) € Yo}. 


Next, observe that the user and the kernel monads preserve subset inclusions, in 
the sense that Us px Sc Ux gX’ and K5, E,s5,0 X = Ky ps oX’ if 2S D, 
E c F', S CS", and X c X’. In particular, we always have Us 7X © USX 
and Ks gso X S KX. Finally, let Runners s,s C S Runner? C be the subset 
of those runners R whose co-operations for X factor through Kyy,z,,,5,c, 1.6., 
OPR : | Aop]] > Ky 245,8,C11 Bop || = KO, E»,S,C Bop ll for each ope X. 
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Ground types 
[b] = --- [unit] <1 [empty] = 0 
[A x B] = [A] x [2] [A+B] = [4] + [2] 


Skeletal types 


[P x Q] = [PI x IQI [P > Q = [PF] = eN 
[P+Q)=[Pl+e] [P> gic EPI = [esc] 
[runner C] = Runner’ [C] [eN = ue] [Pic] = Kicp Pl 
[ei : Pi,..-,¢n: Pa] = [a] x--- x [Pod 


Fig. 4. Denotations of ground and skeletal types. 


Semantics of effectful types is given in Fig. 5. From a category-theoretic 
viewpoint, it assigns meaning in the category Sub(Set) whose objects are subset 
inclusions Xo © X, and morphisms from Xo S Xı to Yo € Yı those maps X; —> 
Yı that restrict to Xo — Yo. The interpretations of products, sums, and function 
types are precisely the corresponding category-theoretic notions x, +, and > in 
Sub(Set). Even better, the pairs of submonads Us œ S US and Ky,z,5,c S Ki 
are the “Sub(Set)-variants” of the user and kernel monads. Such an abstract 
point of view drives the interpretation of terms, given below, and it additionally 
suggests how our semantics can be set up on top of a category other than Set. For 
example, if we replace Set with the category Cpo of w-complete partial orders, 
we obtain the domain-theoretic semantics of effect handlers from [3] that models 
recursion and operations whose signatures contain arbitrary types. 


5.2 Semantics of values and computations 
To give semantics to Acoop’s terms, we introduce skeletal typing judgements 
TRV:P, TEM: P, TREK: PC, 


which assign skeletal types to values and computations. In these judgements, I" 
is a skeletal context which assigns skeletal types to variables. 

The rules for these judgements are obtained from Acoop’s typing rules, by 
excluding subsumption rules and by relaxing restrictions on effects. For example, 
the skeletal versions of the rules TyVALUE-RUNNER and TYKERNEL-KILL are 


(Tr x: Aop = Kop : Bop$ C) nes seS 
DK {(opx => Kop)opex}c : runner C TK killxac s : X54 C 


The relationship between effectful and skeletal typing is summarised as follows: 


Proposition 5. (1) Skeletal typing derivations are unique. (2) If X EY, then 
X5 = Y5, and analogously for subtyping of user and kernel types. (8) JTH V: X, 
then IS V : X5, and analogously for user and kernel computations. 
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Skeletons 
ASA (Y= (2',S,C)) SrunnerC (Xx Y) Æx xY" 
(X> YIUF Sear (X+YFŽX +Y 
(X> YSKS EX > (YIK  (X!uU)s = xX"! 
(£1: X1,...,2n : Xn) = (a1: X$,...,an: X)  (X4(5,E,S,C)F = X540 
Denotations 
TAT = TAI TX x YI = MXT x MXT 
IS = (2, S, C) = Runners, s,s [ICT] —IX+Y] X + x 
IX > Y ug = XI XD S MY uU Y wT 
IX > YKI = (XD. D > (yee YT 
K! S, pI ¥ Usei I:S, 2,5, ] Ks es rolli 
lei : X1,...,¢n : Xn] ŽAN x --- x XT 


Fig. 5. Skeletons and denotations of types. 


Proof. We prove (1) by induction on skeletal typing derivations, and (2) by 
induction on subtyping derivations. For (1), we further use the occasional type 
annotations, and the absence of skeletal subsumption rules. For proving (3), 
suppose that D is a derivation of 1} V : X. We may translate D to its skeleton 
DS deriving I" § V : X° by replacing typing rules with matching skeletal ones, 
skipping subsumption rules due to (2). Computations are treated similarly. 


To ensure semantic coherence, we first define the skeletal semantics of skeletal 
typing judgements, |r = V : P]: [7] > [P], IT = M : PY: (21 > PI, 
and |I Æ K : PC] : [T] —> [PC], by induction on their (unique) derivations. 

Provided maps [|A] x- - -x [An] > [B] denoting ground constants f, values 
are interpreted in a standard way, using the bi-cartesian closed structure of sets, 
except for a runner {(op x => Kop)opex}c, which is interpreted at an environment 
y€ |Z] as the skeletal runner {op : [| Aop]] > Ko, 24.8.10] | Bop] }opeo, given by 


opa = (if op € X then p([I, x : Aop Æ Kop : Bop C]|(7, a)) else £). 
Here the map p : Kic] [Bol > Ko,z,,,8,[cq |] Bop] is the skeletal kernel theory 
homomorphism characterised by the equations 
plreturn b) = return b, p(op'(d', r, (Ye )eeBy)) = OP! (a, p0 K (ple) ee ay) 
p(getenvk) = getenv(pok), —p(raisee) = (if e € Ep then raise e else £), 
p(setenv(c,&)) = getenv(c,poK), — p(kills) = kill s. 
The purpose of $ in the definition of op is to model a runtime error when the 


runner is asked to handle an unexpected operation, while p makes sure that op 
raises at most the exceptions Fop, as prescribed by the signature of op. 
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User and kernel computations are interpreted as elements of the correspond- 
ing skeletal user and kernel monads. Again, most constructs are interpreted in 
a standard way: returns as the units of the monads; the operations raise, kill, 
getenv, setenv, and ops as the corresponding algebraic operations; and match 
statements as the corresponding semantic elimination forms. The interpretation 
of exception handling offers no surprises, e.g., as in [30], as long as we follow the 
strategy of treating unexpected situations with the runtime error +. 

The most interesting part of the interpretation is the semantics of 


IF (using V @ W run M finally F) : Q!, (4) 


where F = {return z @ c => N,(raisee @ c > Ne)een, (kills > Ns)ses}. At an 
environment y € [I], V is interpreted as a skeletal runner with state [[C]], which 
induces a monad morphism r : Treeo (—) > ([[C]] = Treeo (— x [C] + S)), as 
in the proof of Prop. 3. Let f : KicylPl] > MC] = U*Q]) be the skeletal 
kernel theory homomorphism diane d by the equations 


f(returnp) = Ac. |T, x: P,c:C PN: Q] (7, 7,0), 
F(Op(4, K, (Ve)ee Hep) = AC. op(a, AD. f(r b) c, (f (Ve) Clee Ea)» 
f(raisee) = Ac. (if ee E then |T, c: CK Ne : Q] (7, ©) else 4), 
f (kills) = Ac. (if s e S then [I FN, : Q]| y else 4), 
f(getenv x) = Ac. f(r e) ¢, f(setenv(c’,«)) = àc. fre. 


(5) 


The interpretation of (4) at y is f(rypj+e (II = M : PIY) IP RW: Cl), 
which reads: map the interpretation of M at y from the skeletal user monad 
to the skeletal kernel monad using r (which models the operations of M by the 
cooperations of V), and from there using f to a map [C] = US[Q]], that is then 
applied to the initial kernel state, namely, the interpretation of W at 7. 

We interpret the context switch I } kernel K @ W finally F : Q! at an 
environment y € |T] as (I Æ K : PC] y) (I RW: C] 7), where f is the 
map (5). Finally, user context switch is interpreted much like exception handling. 

We now define coherent semantics of Acoop’s typing derivations by passing 
through the skeletal semantics. Given a derivation D of I} V : X, its skeleton 
DS derives I" }/ V : X5. We identify the denotation of V with the skeletal one, 


PeEV: XY FTP RV: X]: (1 > [X]. 


All that remains is to check that || T + V : X]] restricts to [I] > [|X]. This 
is accomplished by induction on D. The only interesting step is subsumption, 
which relies on a further observation that X © Y implies || XJ] € [Y]. Typing 
derivations for user and kernel computations are treated analogously. 


5.3 Coherence, soundness, and finalisation theorems 


We are now ready to prove a theorem that guarantees execution of finalisation 
code. But first, let us record the fact that the semantics is coherent and sound. 
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Theorem 6 (Coherence and soundness). The denotational semantics of 
Acoop 18 coherent, and it is sound for the equational theory of Acoop from §4.4. 


Proof. Coherence is established by construction: any two derivations of the same 
typing judgement have the same denotation because they are both (the same) 
restriction of skeletal semantics. For proving soundness, one just needs to unfold 
the denotations of the left- and right-hand sides of equations from 84.4, and 
compare them, where some cases rely on suitable substitution lemmas. 


To set the stage for the finalisation theorem, let us consider the computation 
using V @ W run M finally F, well-typed by the rule TyUSER-RUN from Fig. 3. 
At an environment y € [|T Ī], the finalisation clauses F are captured semantically 
by the finalisation map $y : (XJI + £) x IC] + $ > IY !(£", £)]], given by 


galulu 2, ¢)) = [P,2:X,c:C LN: Y!(5", E)]](7,2,0), 
by(ti(z2e,c)) = [L,c:C H Ne : Y!(2", ENI, £), 
galale) $ E F Na : Y 1(2', EYN 7. 


With ¢ in hand, we may formulate the finalisation theorem for Acoop, stating that 
the semantics of using V @W run M finally F is a computation tree all of whose 
branches end with finalisation clauses from F. Thus, unless some enveloping 
runner sends a signal, finalisation with F is guaranteed to take place. 


Theorem 7 (Finalisation). A well-typed run factors through finalisation: 


T + (using V @ W run M finally F) : Y !(2", EB’) y= oh t, 


for some t € Treex (([[X]] + £) x [C] + $). 


Proof. We first prove that fuc = ot (u c) holds for all u € Ky £s 7oq [1X] 
and c € |[C]], where f is the map (5). The proof proceeds by computational 


induction on u [29]. The finalisation statement is then just the special case with 
def 


u = rxy + M: X! (X, E)n) and c ¥ [T H W : C]j7. 


6 Runners in action 


Let us show examples that demonstrate how runners can be usefully combined 
to provide flexible resource management. We implemented these and other ex- 
amples in the language CooP and a library HASKELL-COOP, see §7. 

To make the code more understandable, we do not adhere strictly to the 
syntax of Acoop, €-g., we use the generic versions of effects [26], as is customary 
in programming, and effectful initialisation of kernel state as discussed in §3.2. 


Example 8 (Nesting). In Example 4, we considered a runner filelO for basic file 
operations. Let us suppose that filelO is implemented by immediate calls to the 
operating system. Sometimes, we might prefer to accumulate writes and commit 
them all at once, which can be accomplished by interposing between filelO and 
user code the following runner acclO, which accumulates writes in its state: 
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{ write s' — let s = getenv () in setenv (concat s s') }ctring 


By nesting the runners, and calling the outer write (the one of filelO) only in the 
finalisation code for acclO, the accumulated writes are commited all at once: 


using filelO @ (open "hello.txt") run 
using acclO @ (return "") run 
write "Hello, world."; write "Hello, again." 
finally { return x @ s — write s; return x } 
finally { return x © fh > ... , raise QuotaExceeded © fh > ... , kill IOError > ... } 


Example 9 (Instrumentation). Above, acclO implements the same signature as 
filelO and thus intercepts operations without the user code being aware of it. This 
kind of invisibility can be more generally used to implement instrumentation: 


using { ..., op x > let c = getenv () in setenv (c+1); op x, ... }int @ (return 0) run 
M 
finally { return x @ c > report_cost c; return x, ... } 


Here the interposed runner implements all operations of some enveloping runner, 
by simply forwarding them, while also measuring computational cost by counting 
the total number of operation calls, which is then reported during finalisation. 


Example 10 (ML-style references). Continuing with the theme of nested run- 
ners, they can also be used to implement abstract and safe interfaces to low-level 
resources. For instance, suppose we have a low-level implementation of a mem- 
ory heap that potentially allows unsafe memory access, and we would like to 
implement ML-style references on top of it. A good first attempt is the runner 


{ ref x — let h = getenv () in 
let (r,h') = malloc h x in 
setenv h'; return r, 
get r — let h = getenv () in memread h r, 
put (r, x) > let h = getenv () in memset h r x Fheap 


which has the desired interface, but still suffers from three deficiencies that can be 
addressed with further language support. First, abstract types would let us hide 
the fact that references are just memory locations, so that the user code could 
never devise invalid references or otherwise misuse them. Second, our simple 
typing discipline forces all references to hold the same type, but in reality we 
want them to have different types. This could be achieved through quantification 
over types in the low-level implementation of the heap, as we have done in the 
HASKELL-Coop library using HASKELL’s forall. Third, user code could hijack 
a reference and misuse it out of the scope of the runner, which is difficult to 
prevent. In practice the problem does not occur because, so to speak, the runner 
for references is at the very top level, from which user code cannot escape. 


Example 11 (Monotonic state). Nested runners can also implement access re- 
strictions to resources, with applications in security [8]. For example, we can 
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restrict the references from the previous example to be used monotonically by 
associating a preorder with each reference, which assignments then have to obey. 
This idea is similar to how monotonic state is implemented in the F* language [2], 
except that we make dynamic checks where F* statically uses dependent types. 

While we could simply modify the previous example, it is better to implement 
a new runner which is nested inside the previous one, so that we obtain a modular 
solution that works with any runner implementing operations ref, get, and put: 


{ mref x rel > let r = ref x in 
let m = getenv () in 
setenv (add m (r,rel)); return r, 
mget r > get r, 
mput (r, y) > let x = get r in 
let m = getenv () in 
match (sel m r) with 
| inl rel — if (rel x y) then put (r, y) 
else raise MonotonicityViolation 
| inr () — kill NoPreoderFound }map(ref,intRel) 


The runner’s state is a map from references to preorders on integers. The co- 
operation mref x rel creates a new reference r initialised with x (by calling ref of 
the outer runner), and then adds the pair (r, rel) to the map stored in the runner’s 
state. Reading is delegated to the outer runner, while assignment first checks that 
the new state is larger than the old one, according to the associated preorder. If 
the preorder is respected, the runner proceeds with assignment (again delegated 
to the outer runner), otherwise it reports a monotonicity violation. We may not 
assume that every reference has an associated preorder, because user code could 
pass to mput a reference that was created earlier outside the scope of the runner. 
If this happens, the runner simply kills the offending user code with a signal. 


Example 12 (Pairing). Another form of modularity is achieved by pairing run- 
ners. Given two runners {(opx +> Kop)oper, $c, and {(op’ x > Kop )opers too, 
e.g., for state and file operations, we can use them side-by-side by combining 
them into a single runner with operations X1 + Xə and kernel state C x C2, as 
follows (the co-operations op’ of the second runner are treated symmetrically): 


{ op x > let (c,c') = getenv () in 
user 
kernel (Kop x) @ c finally { 
return y @ c"! — return (inl (inl y, c'')), 
(raise e @ c! — return (inl (inr e, c"')))ee Eo: 
(kill s > return (inr s))ses, } 
with { 
return (inl (inl y, c'')) + setenv (c"', c'); return y, 
return (inl (inr e, c'')) —> setenv (c"', c'); raise e, 
return (inr s) > kill s}, 
op! x > p, a PC1xC2 


Notice how the inner kernel context switch passes to the co-operation Kop only 
its part of the combined state, and how it returns the result of Kop in a reified 
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form (which requires treating exceptions and signals as values). The outer user 
context switch then receives this reified result, updates the combined state, and 
forwards the result (return value, exception, or signal) in unreified form. 


7 Implementation 


We accompany the theoretical development with two implementations of Acoop: 
a prototype language Coop [6], and a HASKELL library HASKELL-Coop |1]. 
Coop, implemented in OCAML, demonstrates what a more fully-featured 
language based on Acoop might look like. It implements a bi-directional variant 
of Acoop’S type system, extended with type definitions and algebraic datatypes, 
to provide algorithmic typechecking and type inference. The operational seman- 
tics is based on the computation rules of the equational theory from §4.4, but 
extended with general recursion, pairing of runners from Example 12, and an in- 
terface to the OCAML runtime called containers—these are essentially top-level 
runners defined directly in OCAML. They are a modular and systematic way of 
offering several possible top-level runtime environments to the programmer. 
The HASKELL-Coop library is a shallow embedding of Acoop in HASKELL. The 
implementation closely follows the denotational semantics of Acoop. For instance, 
user and kernel monads are implemented as corresponding HASKELL monads. 
Internally, the library uses the FREER monad of Kiselyov [14] to implement free 
model monads for given signatures of operations. The library also provides a 
means to run user code via HASKELL’s top-level monads. For instance, code 
that performs input-output operations may be run in HASKELL’s IO monad. 
HASKELL’s advanced features make it possible to use HASKELL-COOP to 
implement several extensions to examples from §6. For instance, we implement 
ML-style state that allow references holding arbitrary values (of different types), 
and state that uses HASKELL’s type system to track which references are alive. 
The library also provides pairing of runners from Example 12, e.g., to combine 
state and input-output. We also use the library to demonstrate that ambient 
functions from the Koka language [18] can be implemented with runners by 
treating their binding and application as co-operations. (These are functions 
that are bound dynamically but evaluated in the lexical scope of their binding.) 


8 Related work 


Comodels and (ordinary) runners have been used as a natural model of stateful 
top-level behaviour. For instance, Plotkin and Power [27] have given a treatment 
of operational semantics using the tensor product of a model and a comodel. 
Recently, Katsumata, Rivas, and Uustalu have generalised this interaction of 
models and comodels to monads and comonads [13]. An early version of EFF [4] 
implemented resources, which were a kind of stateful runners, although they 
lacked satisfactory theory. Uustalu [35] has pointed out that runners are the 
additional structure that one has to impose on state to run algebraic effects 
statefully. Mogelberg and Staton’s [21] linear-use state-passing translation also 
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relies on equipping the state with a comodel structure for the effects at hand. 
Our runners arise when their setup is specialised to a certain Kleisli adjunction. 

Our use of kernel state is analogous to the use of parameters in parameter- 
passing handlers [30]: their return clause also provides a form of finalisation, as 
the final value of the parameter is available. There is however no guarantee of 
finalisation happening because handlers need not use the continuation linearly. 

The need to tame the excessive generality of handlers, and willingness to give 
it up in exchange for efficiency and predictability, has recently been recognised 
by MULTICORE OCAML’s implementors, who have observed that in practice 
most handlers resume continuations precisely once [9]. In exchange for impres- 
sive efficiency, they require continuations to be used linearly by default, whereas 
discarding and copying must be done explicitly, incurring additional cost. Lei- 
jen [17] has extended handlers in KOKA with a finally clause, whose semantics 
ensures that finalisation happens whenever a handler discards its continuation. 
Leijen also added an initially clause to parameter-passing handlers, which is used 
to compute the initial value of the parameter before handling, but that gets 
executed again every time the handler resumes its continuation. 


9 Conclusion and future work 


We have shown that effectful runners form a mathematically natural and mod- 
ular model of resources, modelling not only the top level external resources, but 
allowing programmers to also define their own intermediate “virtual machines”. 
Effectful runners give rise to a bona fide programming concept, an idea we have 
captured in a small calculus, called Acoop, which we have implemented both as a 
language and a library. We have given Acoop an algebraically natural denotational 
semantics, and shown how to program with runners through various examples. 

We leave combining runners and general effect handlers for future work. As 
runners are essentially affine handlers, inspired by MULTICORE OCAML we also 
plan to investigate efficient compilation for runners. On the theoretical side, by 
developing semantics in a Sub(Cpo)-enriched setting [32], we plan to support 
recursion at all levels, and remove the distinction between ground and arbitrary 
types. Finally, by using proof-relevant subtyping [34] and synthesis of lenses [20], 
we plan to upgrade subtyping from a simple inclusion to relating types by lenses. 
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Abstract. Logical relations are one among the most powerful tech- 
niques in the theory of programming languages, and have been used 
extensively for proving properties of a variety of higher-order calculi. 
However, there are properties that cannot be immediately proved by 
means of logical relations, for instance program continuity and differen- 
tiability in higher-order languages extended with real-valued functions. 
Informally, the problem stems from the fact that these properties are 
naturally expressed on terms of non-ground type (or, equivalently, on 
open terms of base type), and there is no apparent good definition for 
a base case (i.e. for closed terms of ground types). To overcome this is- 
sue, we study a generalization of the concept of a logical relation, called 
open logical relation, and prove that it can be fruitfully applied in sev- 
eral contexts in which the property of interest is about expressions of 
first-order type. Our setting is a simply-typed A-calculus enriched with 
real numbers and real-valued first-order functions from a given set, such 
as the one of continuous or differentiable functions. We first prove a 
containment theorem stating that for any collection of real-valued first- 
order functions including projection functions and closed under function 
composition, any well-typed term of first-order type denotes a function 
belonging to that collection. Then, we show by way of open logical re- 
lations the correctness of the core of a recently published algorithm for 
forward automatic differentiation. Finally, we define a refinement-based 
type system for local continuity in an extension of our calculus with con- 
ditionals, and prove the soundness of the type system using open logical 
relations. 
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1 Introduction 


Logical relations have been extremely successful as a way of proving equivalence 
between concrete programs as well as correctness of program transformations. 
In their “unary” version, they also are a formidable tool to prove termination of 
typable programs, through the so-called reducibility technique. The class of pro- 
gramming languages in which these techniques have been instantiated includes 
not only higher-order calculi with simple types, but also calculi with recursion 
[3,2,23], various kinds of effects [14,12,25,36,10,11,34], and concurrency [56,13]. 

Without any aim to be precise, let us see how reducibility works, in the 
setting of a simply typed calculus. The main idea is to define, by induction on 
the structure of types, the concept of a well-behaved program, where in the 
base case one simply makes reference to the underlying notion of observation 
(e.g. being strong normalizing), while the more interesting case is handled by 
stipulating that reducible higher-order terms are those which maps reducible 
terms to reducible terms, this way exploiting the inductive nature of simple types. 
One can even go beyond the basic setting of simple types, and extend reducibility 
to, e.g., languages with recursive types [23,2] or even untyped languages [44] by 
means of techniques such as step-indexing [3]. 

The same kind of recipe works in a relational setting, where one wants to 
compare programs rather than merely proving properties about them. Again, two 
terms are equivalent at base types if they have the same observable behaviour, 
while at higher types one wants that equivalent terms are those which maps 
equivalent arguments to equivalent results. 

There are cases, however, in which the property one observes, or the property 
in which the underlying notion of program equivalence or correctness is based, 
is formulated for types which are not ground (or equivalently, it is formulated 
for open expressions). As an example, one could be interested in proving that in 
a higher-order type system all first-order expressions compute numerical func- 
tions of a specific kind, for example, continuous or derivable ones. We call such 
properties first-order properties. As we will describe in Section 3 below, logical 
relations do not seem to be applicable off-the-shelf to these cases. Informally, 
this is due to the fact that we cannot start by defining a base case for ground 
types and then build the relation inductively. 

In this paper, we show that logical relations and reducibility can deal with 
first-order properties in a compositional way without altering their nature. The 
main idea behind the resulting definition, known as open logical relations [59], 
consists in parameterizing the set of related terms of a certain type (or the 
underlying reducibility set) on a ground environment, this way turning it into a 
set of pairs of open terms. As a consequence, one can define the target first-order 
property in a natural way. 


5 To avoid misunderstandings, we emphasize that we use first-order properties to refer 
to properties of expressions of first-order types—and not in relation with definability 
of properties in first-order predicate logic. 
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Generalizations of logical relations to open terms have been used by sev- 
eral authors, and in several (oftentimes unrelated) contexts (see, for instance, 
[15,39,47,30,53]). In this paper, we show how open logical relations constitute a 
powerful technique to systematically prove first-order properties of programs. In 
this respect, the paper’s technical contributions are applications of open logical 
relations to three distinct problems. 

e In Section 4, we use open logical relations to prove a general Containment 
Theorem. Such a theorem serves as a vehicle to introduce open logical re- 
lations but is also of independent interest. The theorem states that given a 
collection § of real-valued functions including projections and closed under 
function composition, any first-order term of a simply-typed A-calculus en- 
dowed with primitives for real numbers and operators computing functions in 
$, computes itself a function in ¥. As an instance of such a result, we see that 
any first-order term in a simply-typed A-calculus extended with primitives 
for continuous functions, computes a continuous function. Although the Con- 
tainment Theorem can be derived from previous results by Lafont [41] (see 
Section 7), our proof is purely syntactical and consists of a straightforward 
application of open logical relations. 

e In Section 5, we use open logical relations to prove correctness of a core 
algorithm for forward automatic differentiation of simply-typed terms. The 
algorithm is a fragment of the one presented in [50]. More specifically, any 
first-order term is proved to be mapped to another first-order term computing 
its derivative, in the usual sense of mathematical analysis. This goes beyond 
the Containment Theorem by dealing with relational properties. 

e In Section 6, we consider an extended language with an if-then-else con- 
struction. When dealing with continuity, the introduction of conditionals in- 
validates the Containment Theorem, since conditionals naturally introduce 
discontinuities. To overcome this deficiency, we introduce a refinement type 
system ensuring that first-order typable terms are continuous functions on 
some intended domain, and use open logical relations to prove the soundness 
of the type system. 

Due to space constraints, many details have to be omitted, but can be found in 
an Extended Version of this work [7]. 


2 The Playground 


In order to facilitate the communication of the main ideas behind open logical 
relations and their applications, this paper deals with several vehicle calculi. All 
such calculi can be seen as derived from a unique calculus, denoted by 4%? 3, 
which thus provides the common ground for our inquiry. The calculus A*:~* is 
obtained by adding to the simply typed A-calculus with product and arrow types 
(which we denote by 4%?) a ground type R for real numbers and constants r 
of type R, for each real number r. 

Given a collection ¥ of real-valued functions, i.e. functions f : R” —> R 
(with n > 1), we endow A*:~*® with an operator f, for any f € §, whose 
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intended meaning is that whenever t,,...,¢, compute real numbers rj,...,1n, 
then f(t1,...,tn) compute f(ri,...,Tn). We call the resulting calculus Az’ 
Depending on the application we are interested in, we will take as § specific 
collections of real-valued functions, such as continuous or differentiable functions. 
The syntax and static semantics of Ae are defined in Figure 1, where 


f : R” —> R belongs to §. The static manies of A% 8 is based on judgments 
of the form T` F t : T, which have the usual intended meaning. We adopt standard 
syntactic conventions as in [6], notably the so-called variable convention. In 
particular, we denote by FV (t) the collection of free variables of t and by s[{t/z] 
the capture-avoiding substitution of the expression t for all free occurrences of 


rin s. 


Tu=ER|[|TxT|TOT Piss (eon r 


tu=a|r| f(t,-..,t) Art| ttl (t,t) | t1 | t2 


Trt :R a Pt, :R Toim Fit 

Tye:tTh ait TrEr:R DFE f(ti,---,tn) oR Tb X\e.t: 71 > Te 

Trs:nom Frtin PRiir FRte:¢ Prt x (i € {1,2}) 
Tt st: 72 Tt} (tite): TX Te ta: 7; : 


Fig. 1: Static semantics of A. 


We do not confine ourselves with a fixed operational semantics (e.g. with a call- 
by-value operational semantics), but take advantage of the simply-typed nature 

Aye and opt for a set-theoretic denotational semantics. The category of 
sets and functions being cartesian closed, the denotational semantics of Axm 
is standard and associates to any judgment £1 : T1, ..., En : Tn F t: 7, a function 
[£1 : Ti,- -En : Tmn Ft: 7]: [Lin] > [7], where [7]—the semantics of r—is 
thus defined: 


[R] = R; [n > 72] = [ro] 4; [n x 72] = [r] x [r]. 


Due to space constraints, we omit the definition of |I | t : 7] and refer the 
reader to any textbook on the subject (such as [43]). 


3 A Fundamental Gap 


In this section, we will look informally at a problem which, apparently, cannot 
be solved using vanilla reducibility or logical relations. This serves both as a 
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motivating example and as a justification of some of the design choices we had 
to do when designing open logical relations. 

Consider the simply-typed A-calculus 4%? , the prototypical example of a 
well-behaved higher-order functional programming language. As is well known, 
A*:~ is strongly normalizing and the technique of logical relations can be applied 
on-the-nose. The proof of strong normalization for A*’” is structured around 
the definition of a family of reducibility sets of closed terms { Red,},, indexed by 
types. At any atomic type 7, Red, is defined as the set of terms (of type T) having 
the property of interest, i.e. as the collection of strongly normalizing terms. The 
set Red,,_,,,, instead, contains those terms which, when applied to a term in 
Red, , returns a term in Red,,. Reducibility sets are afterwards generalised to 
open terms, and finally all typable terms are shown to be reducible. 

Let us now consider the calculus ar where § contains the addition and 
multiplication functions only. This language has already been considered in the 
literature, under the name of higher-order polynomials [22,40], which are crucial 
tools in higher-order complexity theory and resource analysis. Now, let us ask 
ourselves the following question: can we say anything about the nature of those 
functions R” — R which are denoted by (closed) terms of type R” — R? Of 
course, all the polynomials on the real field can be represented, but can we go 
beyond, thanks to higher-order constructions? The answer is negative: terms of 
type R” — R represent all and only the polynomials [5,17]. This result is an 
instance of the general containment theorem mentioned at the end of Section 1. 

Let us now focus on proofs of this containment result. It turns out that proofs 
from the literature are not compositional, and rely on“heavyweight” tools, in- 
cluding strong normalization of A%*:? and soundness of the underlying opera- 
tional semantics. In fact, proving the result using usual reducibility arguments 
would not be immediate, precisely because there is no obvious choice for the base 
case. If, for example, we define Redz as the set of terms strongly normalizing to 
a numeral, Redgn_,z as the set of polynomials, and for any other type as usual, 
we soon get into troubles: indeed, we would like the two sets of functions 


Redgxr—r} Redp_, (pr); 


to denote essentially the same set of functions, modulo the adjoint between 
R? > R and R > (R > R). But this is clearly not the case: just consider the 
function f in R —> (R > R) thus defined: 


J àyy ifr>o0 
Ons reise 


Clearly, f turns any fixed real number to a polynomial, but when curried, it 
is far from being a polynomial. In other words, reducibility seems apparently 
inadequate to capture situations like the one above, in which the “base case” is 
not the one of ground types, but rather the one of first-order types. 

Before proceeding any further, it is useful to fix the boundaries of our in- 
vestigation. We are interested in proving that (the semantics of) programs of 
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first-order type R” — R enjoy first-order properties, such as continuity or dif- 
ferentiability, under their standard interpretation in calculus and real analysis. 
More specifically, our results do not cover notions of continuity and differentiabil- 
ity studied in fields such as (exact) real-number computation [57] or computable 
analysis [58], which have a strong domain-theoretical flavor, and higher-order 
generalizations of continuity and differentiability (see, e.g., [26,27,32,29]). We 
leave for future work the study of open logical relations in these settings. What 
this paper aims to provide, is a family of lightweight techniques that can be 
used to show that practical properties of interest of real-valued functions are 
guaranteed to hold when programs are written taking advantage of higher-order 
constructors. We believe that the three case studies we present in this paper are 
both a way to point to the practical scenarios we have in mind and of witnessing 
the versatility of our methodology. 


4 Warming Up: A Containment Theorem 


In this section we introduce open logical relations in their unary version (i.e. open 
logical predicates). We do so by proving the following Containment Theorem. 


Theorem 1 (Containment Theorem). Let § be a collection of real-valued 
functions including projections and closed under function composition. Then, 
any ie term xı :R,...,%,:REt:R denotes a function (from R” to R) in 
5. That is, [v1 :R,...,a@, :REt:R) €F. 


As already remarked in previous sections, notable instances of Theorem 1 
are obtained by taking § as the collection of continuous functions, or as the 
collection of polynomials. 

Our strategy to prove Theorem 1 consists in defining a logical predicate, 
denoted by F, ensuring the denotation of programs of a first-order type to be 
in §, and hereditary preserving this property at higher-order types. However, ¥ 
being a property of real-valued functions—and the denotation of an open term 
of the form zı :R,...,%, : RF t: R being such a function—we shall work with 
open terms with free variables of type R and parametrize the candidate logical 
predicate by types and environments O containing such variables. 

This way, we obtain a family of logical predicates F? acting on terms of the 
form O F t : T. As a consequence, when considering the ground type R and an 
environment O = 21 : R,...,%n : R, we obtain a predicate FP on expressions 
O + t : R which naturally corresponds to functions from R” to R, for which 
belonging to ¥ is indeed meaningful. 


Definition 1 (Open Logical Predicate). Let O = x1 : R,..., £n : R be a fixed 
environment. We define the type-indexed family of predicates FÈ by induction 
on T as follows: 


te FÊ 4 (OFt:RA[OFt: RJ ES) 
oe FL n => (OF t:71 4 RAVs E€ FÌ. ts € FÌ) 
tE F? < (OF t: X T2 AVi E {1,2}. tie Fe). 


TIXT2 
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We extend FÌ? to the predicate F!°°, where I ranges over arbitrary environ- 
ments (possibly containing variables of type R) as follows: 


te FDO o> (T,OFt: TAYY. ye FS => ty € FÌ). 


Here, y ranges over substitutions? and y € EL holds if the support of y is I and 
y(x) € FÊ, for any (x: T) ET. 


Notice that Definition 1 ensures first-order real-valued functions to be in §, 
and asks for such a property to be hereditary preserved at higher-order types. 
Lemma 1 states that these conditions are indeed sufficient to guarantee any 
A term O F t :R to denote a function in §. 


Lemma 1 (Fundamental Lemma). For all environments [',O as above, and 
for any expression T,O F t:7, we have t € FE. 

Proof. By induction on t, observing that FÊ? is closed under denotational se- 
mantics: if s € F? and [O F t: 7] = [O F s: 7], then t € F2. The proof follows 
the same structure of Lemma 3, and thus we omit details here. 


Finally, a straightforward application of Lemma 1 gives the desired result, 
namely Theorem 1. 


5 Automatic Differentiation 


In this section, we show how we can use open logical relations to prove the 
correctness of (a fragment of) the automatic differentiation algorithm of [50] 
(suitably adapted to our calculus). 

Automatic differentiation [8,9,35] (AD, for short) is a family of techniques 
to efficiently compute the numerical (as opposed to symbolical) derivative of 
a computer program denoting a real-valued function. Roughly speaking, AD 
acts on the code of a program by letting variables incorporate values for their 
derivative, and operators propagate derivatives according to the chain rule of 
differential calculus [52]. Due to its vast applications in machine learning (back- 
propagation [49] being an example of an AD technique) and, most notably, in 
deep learning [9], AD is rapidly becoming a topic of interest in the programming 
language theory community, as witnessed by the new line of research called dif- 
ferentiable programming (see, e.g., [28,50,16,1] for some recent results on AD 
and programming language theory developed in the latter field). 

AD comes several modes, the two most important ones being the forward 
mode (also called tangent mode) and the backward mode (also called reverse 
mode). These can be seen as different ways to compute the chain rule, the former 
by traversing the chain rule from inside to outside, while the latter from outside 
to inside. 


6 We write ty for the result of applying y to variables in t. 
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Here we are concerned with forward mode AD. More specifically, we consider 
the forward mode AD algorithm recently proposed in [50]. The latter is based 
on a source-to-source program transformation extracting out of a program t a 
new program Dt whose evaluation simultaneously gives the result of computing 
t and its derivative. This is achieved by augmenting the code of t in such a way 
to handle dual numbers”. 

The transformation roughly goes as follows: expressions s of type R are trans- 
formed into dual numbers, i.e. expressions s’ of type Rx R, where the first compo- 
nent of s’ gives the original value of s, and the second component of s’ gives the 
derivative of s. Real-valued function symbols are then extended to handle dual 
numbers by applying the chain rule, while other constructors of the language 
are extended pointwise. 

The algorithm of [50] has been studied by means of benchmarks and, to the 
best of the authors’ knowledge, the only proof of its correctness available in the 
literature® has been given at the time of writing by Huot et al. in [37]. However, 
the latter proof relies on denotational semantics, and no operational proof of 
correctness has been given so far. Differentiability being a first-order concept, 
open logical relations are thus a perfect candidate for such a job. 


An AD Program Transformation In the rest of this section, given a differentiable 
function f : R” — R, we denote by 0,f : R” — R its partial derivative with 
respect to the variable x. Let D be the collection of (real-valued) differentiable 
functions, and let us fix a collection § of real-valued functions such that, for any 
f €D, both f and 0,f belong to §. We also assume Ẹ to contain functions for 
real number arithmetic. Notice that since 0,f is not necessarily differentiable, 
in general 0, f ¢ DÐ. 

We begin by recalling how the program transformation of [50] works on 
AX?" the extension of A% with operators for functions in D. In order 
to define the derivative of a A expression, we first define an intermediate 
program transformation D : er > Ae such that: 


TPE = Dt Dt: Dr. 


The action of D on types, environments, and expressions is defined in Figure 2. 

. à . . R . . . R 

Notice that t is an expression in Ae , whereas Dt is an expression in Ay ; 
v 


Let us comment the definition of D, beginning with its action on types. Follow- 
ing the rationale behind forward-mode AD, the map D associates to the type 


T We represent dual numbers [21] as pairs of the form (z, x’), with x, z’ € R. The first 
component, namely x, is subject to the usual real number arithmetic, whereas the 
second component, namely x’, obeys to first-order differentiation arithmetic. Dual 
numbers are usually presented, in analogy with complex numbers, as formal sums 
of the form x + x'e, where € is an abstract number (an infinitesimal) subject to the 
law €? = 0. 

8 However, we remark that formal approaches to backward automatic differentiation 
for higher-order languages have been recently proposed in [1,16] (see Section 7). 
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DR=RXR D(-) =- 
D(m1 X T2) =Dt1 X Dre D(a:7,0) = dx Dr DI 


D(m > T2) = Dm > Dre 


Dr = (r,0) D(f(ti;.--;ta)) =f Dt, x. Dh DaO, .. . ,Dtn-1) * DE) 


Dr =dr D(Aa.t)=Ada.Dt D(st)= (Ds)(Dt) D(t.i) =Dt.i D(tı,t2) = (Dti, Dt2) 


Fig. 2: Intermediate transformation D 


R the product type R x R, the first and second components of its inhabitants 
being the original expression and its derivative, respectively. The action of D 
on non-basic types is straightforward and it is designed so that the automatic 
differentiation machinery can handle higher-order expressions in such a way to 
guarantee correctness at real-valued function types. 


The action of D on the usual constructors of the A-calculus is pointwise, 
although it is worth noticing that D associates to any variable x of type T a new 
variable, which we denote by dx, of type Dr. As we are going to see, if T = R, 
then da acts as a placeholder for a dual number. 

More interesting is the action of D on real-valued constructors. To any nu- 
meral r, D associates the pair Dr = (r,0), the derivative of a number being zero. 
Let us now inspect the action of D on an operator f associated to f : R” > R 
(we treat f as a function in the variables 21,...,2,). The interesting part is the 
second component of D(f(ti,...,tn)), namely 


XC 3: (Dt Lj. i5 Dig) * Dt;.2 
i=1 


where )>;_, and * denote the operators (of Ay 8) associated to summation 
and (binary) multiplication (for readability we omit the underline notation), and 
ðs; f is the operator (of Ao) associated to partial derivative ôs, f of f in the 
variable x;. It is not hard to recognize that the above expression is nothing but 
an instance of the chain rule. 


Finally, we notice that if 2 + t:r is a (derivable) judgment in Le then 
indeed DI’ F Dt : Dr is a (derivable) judgment in Aa 


Example 1. Let us consider the binary function f(a1,22) = sin(x) + cos(x2). 
For readability, we overload the notation writing f in place of f (and similarly 
for z, f). Given expressions t,,t2, we compute D(sin(¢,) + cos(t2)). Recall that 
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Ox, f(%1, £2) = cos(x,) and z, f (£1, £2) = — sin(x2). We have: 


D(sin(t1) + cos(t2)) 
= (sin(Dt,.1) T cos(Dt 9.1), Ox, f Dti. 1 , Dtə. 1) * Dt;.2 + Ox, f (Dt1.1, Dt2.1) x Dt2.2) 
= (sin(D¢,.1) + ee cos(Dt,.1) * Dty.2 — sin(Dtg.1) * Dtg.2). 


As a consequence, we see that D(Aa.Ay. sin(x) + cos(y)) is 
Adx.Ady.(sin(dz.1) + cos(dy.1), cos(da.1) x dx.2 — sin(dy.1) * dy.2). 


We now aim to define the derivative of an expression z1 :R,...,%,:RE¢t:R 
with respect to a variable x (of type R). In order to do so we first associate to 
any variable y : R its dual expression dual, (y) : R x R defined as: 


dual (y) (yl) ife=y 
ua. x£ = 
7 (y,0) otherwise. 


Next, we define for x1 : R,...,£n : RF t: R the derivative deriv(x,t) of t with 
respect to x as: 


deriv(x,t) = Dt|dual, (x1)/dzı,..., dualz(£n)/d£n].2 
Let us clarify this passage with a simple example. 


Example 2. Let us compute the derivative of x : R,y : RF t: R, where t = z * y. 
We first of all compute Dt, obtaining: 
dz : R x R,dy : R x R H ((dax.1) » (dy.1), (dz.1) » (dy.2) + (dx.2) * (dy.1)) : R x R. 


Observing that dual, (x) = (x,1) and dual, (y) = (y,0), we indeed obtain the 
desired derivative as x : R,y : R F Dt[dual,(x)/dz, dual, (y)/dy].2 : R. For we 
have: 

[z : R, y : RF Dt[dual,(«)/da, dual, (y)/dy].2 : R] 

= |x :R,y:RF (x*y,xx0+1x*y).2:R] 

= |x: R,y: RF y: R] = rlx :R,y: RF oxy: R]. 


Remark 1. For O = zı : R,...,£n : R we have O F dual,(z;) : DR and OF 
Ds[|dual,(x1)/dz1ı,..., dualy(£n)/dzn] : Dr, for any variable y and OF s:r. 


Open Logical relations for AD We have claimed that the operation deriv per- 
forms automatic differentiation of AC expressions. By that we mean that 
once applied to expressions of the form x, : R,...,%, : RE t: R, the operation 
deriv can be used to compute the derivative of [v1 : R,..., £n : RF t: R]. We 
now show how we can prove such a statement using open logical relations, this 
way providing a proof of correctness of our AD program transformation. 

We begin by defining a logical relations R between 4X7 ® and A; TR ex- 
pressions. We design R in such a way that (i) tRDt and (ii) i iftRs and È inhabits 
a first-order type, then indeed s corresponds to the derivative of t. While (ii) 
essentially holds by definition, (i) requires some efforts in order to be proved. 
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Definition 2 (Open Logical Relation). Let O = 7, : R,..., £n : R be a fixed, 
arbitrary environment. Define the family of relations (R°)o,, between A 
and Aer expressions by induction on T as follows: 


OFt:RADOFs:RXR 

Vy: R. 

[O F s[dual,(x1)/da,...,dualy(x,)/dx,].1:R] = [OF t : R] 
[O F- s[dual,(x1)/dz1,..., dualy(£n)/d£n].2 : R] = 0,[OF t : R] 


tR s = 


tRE 


T1 >T2 


O t:n > T ADO! s: DT —> Dt? 
s Ə Ə 
Vp, q. PRZ 4 = PR} sq 


tR s 4> 


T1 XT2 


Of t:71 XTn ADOF s: Dry x Dre 
Vi € {1,2}. ti RÌ s.i 


We extend RÊ to the family (RE®)r o, where I ranges over arbitrary envi- 
ronments (possibly containing variables of type R), as follows: 

tRI? s 4> (P,Obt:r)A(DP, DOE s:Dr) A (V7,6. YRE SO => tyR® sô) 
where y, 0 range over substitutions, and: 

yRg <=> (supp(y) = T) A (supp(5) = DT) A (Vw: 7) € T. (2) RP 6(dz)). 


Obviously, Definition 2 satisfies condition (ii) above. What remains to be 
done is to show that it satisfies condition (i) as well. In order to prove such a 
result, we first need to show that the logical relation respects the denotational 
semantics of ATE, 


Lemma 2. Let O = z1 :R,..., £n : R. Then, the following hold: 


ERE sAlOHt:T]=[OHt:T] = tRÊs 
tR® s' A [DOF $ : Dr] = [DOF s : Dr] > tRÊ s. 


Proof. A standard induction on T. 


We are now ready to state and prove the main result of this section. 


Lemma 3 (Fundamental Lemma). For all environments [,O and for any 
expression T,O H t: 7, we have t RI’? Dt. 


Proof. We prove the following statement, by induction on t: 
vt. Yr. YT, O. (T,O H t:T => t RP? Dt). 


We show only the most relevant cases. Suppose t is a variable x. We distinguish 
whether x belongs to T or ©. 
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1. Suppose (x : R) € ©. We have to show x RES dz, i.e. 


[O | dr[dual,(x)/dz].1 : R] = [O F z : R] 
[O | dr[dual,(x)/az].2 : R] = ð [O F x: R] 


for any variable y (of type R). The first identity obviously holds as 
[O | dr[dual,(x)/dz].1 : R] = [OF da[(a,b)/da].1:R] = [O F z : R], 


where b € {0,1}. For the second identity we distinguish whether y = x or 
y #«. In the former case we have dual, (x) = (x, 1), and thus: 


[O | da[dual,(x)/dx].2:R] = [O F 1 : R] = æ [O F y : R]. 
In the latter case we have dual, (x) = (x, 0), and thus: 
[O | dz[dual,(x)/az].2 : R] = [O F 0 : R] = 3 [O F a: R]. 


2. Suppose (x: T) € I’. We have to show x R>? dz, i.e. y(x) RE 6(dz), for all 
substitutions y,ô such that y RE 6. Since x belongs to I’, we are trivially 
done. 

Suppose t is Ax.s, so that we have 


IO titi r 8:7 
T,OF ATs: 1> Te 


for some types T1, T2. As x is bound in Ax.s, without loss of generality we can 
assume (2:7) Z TUO. Let A=T,x2: 7%, so that we have A,O F s : T2, and 
thus s RAS Ds, by induction hypothesis. By definition of open logical relation, 
we have to prove that for arbitrary y, ô such that y RẸ 6, we have 


AL.SY RÊ n Adx.(Ds)ð, 


i.e. (Ax.sy)p RE, (Adx.(Ds)ð)q, for all PRE q. Let us fix a pair (p,q) as above. 
By Lemma 2, it is sufficient to show (s7)[p/z] R, ((Ds)ô)[q/azx]. Let 7’, 6’ be the 
substitutions defined as follows: 


iis p ify=a P= q if y= dx 
va y(y) otherwise oe d(y) otherwise. 


It is easy to see that y RG 6’, so that by s RÆ? Ds (recall that the latter follows 
by induction hypothesis) we infer sy’ RE (Ds)ð', by the very definition of open 
logical relation. As a consequence, the thesis is proved if we show 


(sy) [p/2] = 87; ((Ds)5)[q/da] = (Ds)d". 


The above identities hold if x ¢ FV(y(y)) and dx ¢ FV(d(dy)), for any (y : 
T) € I. This is indeed the case, since y(y) RE 6(dy) implies O + y(y) : T and 
DO F- d(dy) : Dr, and x ¢ O (and thus dz ¢ DO). 
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A direct application of Lemma 3 allows us to conclude the correctness of 
the program transformation D. In fact, given a first-order term O F t : R, with 
O = q1 :R,..., Zn : R, by Lemma 3 we have t RE Dt, and thus 


lO F t: R] = [OF Dt[dual,(x1)/dx1,...,dualy(r,)/dx,].2 : R], 


for any real-valued variable y, meaning that Dt indeed computes the partial 
derivative of t. 


Theorem 2. For any term O F t: R as above, the term DO F Dt : DR computes 
the partial derivative of t, i.e., for any variable y we have 


OyJO F t: R] = [OF Dt[dual,(xz1)/dz1,..., dualy(£n)/d£n].2 : R]. 


6 On Refinement Types and Local Continuity 


In Section 4, we exploited open logical relations to establish a containment the- 
orem for the calculus 4ž °? *, i.e. the calculus A*:™®? extended with real-valued 
functions belonging to a set § including projections and closed under function 
composition. Since the collection € of (real-valued) continuous functions satisfies 
both constraints, Theorem 1 allows us to conclude that all first order terms of 
P a represent continuous functions. 

The aim of the present section is the development of a framework to prove 
continuity properties of programs in a calculus that goes beyond Ar More 
specifically, (i) we do not restrict our analysis to calculi having operators rep- 
resenting continuous real-valued functions only, but consider operators for ar- 
bitrary real-valued functions, and (ii) we add to our calculus an if-then-else 
construct whose static semantics is captured by the following rule: 


PRR. Piesir: Lippir 
I} if t then s else p: 7 


The intended dynamic semantics of the term if t then s else p is the same as 
the one of s whenever t evaluates to any real number r 4 0 and the same as the 
one of p if it evaluates to 0. 


Notice that the crux of the problem we aim to solve is the presence of the 
if-then-else construct. Indeed, independently of point (i), such a construct breaks 
the global continuity of programs, as illustrated in Figure 3a. As a consequence 
we are forced to look at local continuity properties, instead: for instance we 
can say that the program of Figure 3a is continuous both on R<o and Rso. 
Observe that guaranteeing local continuity allows us (up to a certain point) to 
recover the ability of approximating the output of a program by approximating 
its input. Indeed, if a program t : Rx... X R — R is locally continuous on a 
subset X of R”, then the value of ts (for some input s) can be approximated 
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[t](x) He) 


(a) t = Az.if x < 0 then —xelsex+1 (b)t= Az.if x< 0 then 1 else x+1 


Fig. 3: Simply typed first-order programs with branches 


by passing as argument to t a family (s,)nen of approximations of s, as long as 
both s and all the (Sn)nen are indeed elements of X. Notice that the continuity 
domains we are interested in are not necessary open sets: we could for instance 
be interested in functions that are continuous on the unit circle, i.e. the points 
{(a,b) | a? +b? = 1} C R?. For this reason we will work with the notion 
of sequential continuity, instead of the usual topological notion of continuity. 
It must be observed, however, that these two notions coincide as soon as the 
continuity domain X is actually an open set. 


Definition 3 (Sequential Continuity). Let f : R” —> R, and X be any subset 
of R”. We say that f is (sequentially) continuous on X if for every x € X, and 
for every sequence (£n)nen of elements of X such that limp £n = x, it holds 
that limno f (£n) = f(x). 


n [18], Chaudhuri et al. introduced a logical system designed to guarantee 
local continuity properties on programs in an imperative (first-order) program- 
ming language with conditional branches and loops. In this section, we develop 
a similar system in the setting of a higher-order functional language with an 
if-then-else construct, and we use open logical relations to prove the sound- 
ness of our system. This witnesses, on yet another situation, the versatility of 
open logical relations. Compared to [18], we somehow generalize from a result 
on programs built from only first-order constructs and primitive functions, to a 
containment result for programs built using also higher-order constructs. 

We however mention that, although our system is inspired by the work of 
Chaudhuri at al., there are significant differences between the two, even at the 
first-order level. The consequences these differences have on the expressive power 
of our systems are twofold: 

e On the one hand, while inferring continuity on some domain X of a program 
of the form if t then s else p, we have more flexibility than [18] for the 
domains of continuity of s and p. To be more concrete, let us consider the 
program Axv.(if (x > 0) then 0 else (if x = 4 then 1 else 0)), which is 
continuous on R even though the second branch is continuous on R<o, but 
not on R. We are able to show in our system that this program is indeed 
continuous on the whole domain R, while Chaudhuri et al. cannot do the 
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same in their system for the corresponding imperative program: they ask the 
domain of continuity of each of the two branches to coincide with the domain 
of continuity of the whole program. 

e On the other hand, the system of Chaudhuri at al. allows one to express 
continuity along a restricted set of variables, which we cannot do. To illustrate 
this, let us look at the program: Ax, y.if (x = 0) then (3 * y) else (4 x y): 
along the variable y, this program is continuous on the whole of R. Chaudhuri 
et al. are able to express and prove this statement in their system, while we 
can only say that for every real a, this program is continuous on the domain 
{a} x R. 

For the sake of simplicity, it is useful to slightly simplify our calculus; the ideas 
we present here, however, would still be valid in a more general setting, but 
that would make the presentation and proofs more involved. As usual, let Ẹ be 
a collection of real-valued functions. We consider the restriction of the calculus 
Ag > obtained by considering types of the form 


Tu= R] p = MXX py XRX+++ XRT; 
m-times 


only. For the sake of readability, we employ the notation (p1...,Pn,R,...,R) > T 
in place of pı X+ X Pn XRX X R— T. We also overload the notation and 
keep indicating the resulting calculus as Ag R, Nonetheless, the reader should 
keep in mind that from now on, whenever referring to a Ae term, we are 
tacitly referring to a term typable according to the restricted type system, but 
that can indeed contain conditionals. 

Since we want to be able to talk about composition properties of locally 
continuous programs, we actually need to talk not only about the points where 
a program is continuous, but also about the image of this continuity domain. 
In higher-order languages, a well-established framework for the latter kind of 
specifications is the one of refinement types, that have been first introduced 
by [31] in the context of ML types: the basic idea is to annotate an existing 
type system with logical formulas, with the aim of being more precise about 
the underlying program’s behaviors than in simple types. Here, we are going to 
adapt this framework by replacing the image annotations provided by standard 
refinement types with continuity annotations. 


6.1 A Refinement Type System Ensuring Local Continuity 


Our refinement type system is developed on top of the simple types system of 
Section 2 (actually, on the simplification of such a system we are considering in 
this section). We first need to introduce a set of logical formulas which talk about 
n-uples of real numbers, and which we use as annotations in our refinement types. 
We consider a set V of logical variables, and we construct formulas as follows: 


vgelz=T | (ese) | yag | w, 
ecE «=a |a| fle...,e) witha € V,a E€ R, f:R” >R. 
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Recall that with the connectives in our logic, we are able to encode logical 
disjunction and implication, and as customary, we write ¢ > wW for ~ọ V W. A 
real assignment is a partial map o : V —> R. When o has finite support, we 
sometimes specify o by writing (a1 > o(a1),...,Qn ++ a(Qn)). We note o = @ 
when ø is defined on the variables occurring in ¢, and moreover the real formula 
obtained when replacing along ø the logical variables of ¢ is true. We write = ¢ 
when o |= ¢ always holds, independently on ø. 

We can associate to every formula the subset of R” consisting of all points 
where this formula holds: more precisely, if ¢@ is a formula, and X = aj,...,Qn 
is a list of logical variables such that Vars(¢) C X, we call truth domain of ¢ 
w.r.t. X the set: 


Dom(¢)* = {(a1,... an) E€ R” | (a1 > a1,...,Qn œ an) = Q}. 


We are now ready to define the language of refinement types, which can be 
seen as simple types annotated by logical formulas. The type R is annotated by 
logical variables: this way we obtain refinement real types of the form {a € R}. 
The crux of our refinement type system consists in the annotations we put on 
the arrows. We introduce two distinct refined arrow constructs, depending on 
the shape of the target type: more precisely we annotate the arrow of a type 
(T1,..-,In) > R with two logical formulas, while we annotate (T1, ..., Ta) > H 
(where H is an higher-order type) with only one logical formula. This way, we ob- 
tain refined arrow types of the form (7},... Ta) 3 a € R}, and (Ti, ..., Tn) = 
H: in both cases the formula w specifies the continuity domain, while the formula 
ġ is an image annotation used only when the target type is ground. The intuition 
is as follows: a program of type (H1,..., Hn, {ai E R},..., {an E R}) 3% ER} 
uses its real arguments continuously on the domain specified by the formula w 
(w.r.t @1,..-,@n), and this domain is sent into the domain specified by the for- 
mula ¢ (w.r.t. a). Similarly, a program of the type (T1,..., Tn) # H has its real 
arguments used in a continuous way on the domain specified by w, but it is not 
possible anymore to specify an image domain, because H is higher-order. 

The general form of our refined types is thus as follows: 


T := H | F; F ::= {a ER}; 
To Y yo 
H &= (Hi, ., Hm, Fi, Fp) SH | (Hay...) Hm, Fuse) Fa) °F 


with n +m > 0, Vars(¢) C {a}, Vars(w) C {01,...,@n} when F = {a € R}, 
F; = {a; € R}, and the (a;)1<i<n are distinct. We take refinement types up to 
renaming of logical variables. If T is a refinement type, we write T for the simple 
type we obtain by forgetting about the annotations in T. 


Example 3. We illustrate in this example the intended meaning of our refinement 


types. 
e We first look at how to refine R — R: those are types of the form {a; € 


R} R las € R}. The intended inhabitants of these types are the programs 
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t : R — R such that i) [t] is continuous on the truth domain of ¢,; and 
ii) [t] sends the truth domain of ¢; into the truth domain of 2. As an 
example, ¢; could be (a1 < 3), and ¢2 could be (az > 5). An example of a 
program having this type is t = Ax.(5 + f(x)), where f : R — R is defined 


1 
a— wh <3 

as f(a) = < 3-4 a B “ , and moreover we assume that {f,+} C ¥. 
0 otherwise 


e We look now at the possible refinements of R + (R — R): those are of the form 
{ar E€ R} A ({ag € RE as E€R}). The intended inhabitants of these 
types are the programs ¢ : R > (R > R) whose interpretation function (x, y) € 
R? ++ ft] (x)(y) sends continously Dom(@1)“! x Dom(62)°? into Dom(03)®. 
As an example, consider 0; = (a1 < 1), 2 = (a2 < 3), and 63 = (ag > 0). 
An example of a program having this type is Ax1.Ar2. f(x 1 * £2) where we 
take f as above. ~ 


A refined typing context I is a list zı : Ti,..., £n : Tn, where each T; is a 
refinement type. In order to express continuity constraints, we need to annotate 
typing judgments by logical formulas, in a similar way as what we do for arrow 
types. More precisely, we consider two kinds of refined typing judgments: one 
for terms of ground type, and one for terms of higher-order type: 


Y pep 
It, t: A; Pore. tef: 


6.2 Basic Typing Rules 


We first consider refinement typing rules for the fragment of our language which 
excludes conditionals: they are given in Figure 4. We illustrate them by way of 
a series of examples. 


Example 4. We first look at the typing rule var-F: if 6 implies 6’, then the 
variable «—that, in semantics terms, does the projection of the context I" to 
one of its component—sends continuously the truth domain of 0 into the truth 
domain of 6’. Using this rule we can, for instance, derive the following judgment: 


(a>0AB>0)~+(a>0) 
xz:{a ER}, y: {8 eR} Fz z:{a ER} (1) 


Example 5. We now look at the Rf rule, that deals with functions from §. Using 
this rule, we can show that: 


(a20AB20)~>+(y20) 
x:{a@eR},y: {6 eR} Fz min(x,y):{y ER}. (2) 


Before giving the refined typing rule for the if-then-else construct, we also 
illustrate on an example how the rules in Figure 4 allow us to exploit the conti- 
nuity informations we have on functions in ¥, compositionally. 
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0>% 
var-H 7 var-F Aso 
T,a:Htra:H T,x:{aeR} Fr x:{a €R} 
f € § is continuous on Dom(6, A... A 0p) B90; oe Pe, 
f(Dom(04. A... 0,)®*9") C Dom(6")? poe een 
~0! 


TP Fr f(ti...tn) : {8 ER} 


Y(n) 
Pa PDs esas tL Fe t3 7 Edi Ade > Y 
abs “i 
2 
Dre N61; 3.5 takta (Bip sgh) HQ) op 
> 
? T 
Dreti (Hiper Hmi Fiesa Fn) E (T Fr py : Fy)i<j<m 


app 
o(n) 


T ry t(si, -38m P1; -+ ‘Pml 4 
The formula Y(n) should be read as y when T is a higher-order type, and as Y ~ 7 
when T is a ground type. 


Fig. 4: Typing Rules 


—xifr<0 

x + 1 otherwise 
Observe that we can actually regard f as represented by the program in Fig- 
ure 3a—but we consider it as a primitive function in § for the time being, since 
we have not introduced the typing rule for the if-then-else construct, yet. Con- 
sider the program: 


Example 6. Let f : R + R be the function defined as: f(a) = 


t = X(2,y).f(min(z, y)). 


We see that [t] : R? —> R is continuous on the set {(z,y) |£ > OA y > 0}, 
and that, moreover, the image of f on this set is contained on [1, +00). Using 
the rules in Figure 4, the fact that f is continuous on Rso, and that min is 
continuous on R?, we see that our refined type system allows us to prove t to be 
continuous in the considered domain, i.e.: 


(a 20820) (721) 


Fr t: ({a ER}, {8 € R}) {VER}. 


6.3 Typing Conditionals 


We now look at the rule for the if-then-else construct: as can be seen in the 
two programs in Figure 3, the use of conditionals may or may not induce dis- 
continuity points. The crux here is the behaviour of the two branches at the 
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discontinuity points of the guard function. In the two programs represented in 
Figure 3, we see that the only discontinuity point of the guard is in x = 0. How- 
ever, in Figure 3b the two branches return the same value in 0, and the resulting 
program is thus continuous at x = 0, while in Figure 3a the two branches do 
not coincide in 0, and the resulting program is discontinuous at x = 0. We can 
generalize this observation: for the program if t then s else p to be continu- 
ous, we need the branches s and p to be continuous respectively on the domain 
where t is 1, and on the domain where t is 0, and moreover we need s and p 
to be continuous and to coincide on the points where t is not continuous. Simi- 
larly to the logical system designed by Chaudhuri et al [18], the coincidence of 
the branches in the discontinuity points is expressed as a set of logical rules by 
way of observational equivalence. It should be observed that such an equivalence 
check is less problematic for first-order programs than it is for higher-order one 
(the authors of [18] are able to actually check observational equivalence through 
an SMT solver). On the other hand, various notions of equivalence which are 
included in contextual equivalence and sometimes coincide with it (e.g., applica- 
tive bisimilarity, denotational semantics, or logical relations themselves) have 
been developed for higher-order languages, and this starts to give rise to actual 
automatic tools for deciding contextual equivalence [38]. 

We give in Figure 5 the typing rule for conditionals. The conclusion of the 
rule guarantees the continuity of the program if t then s else p on a do- 
main specified by a formula 0. The premises of the rule ask for formulas 6, for 
q € {t, s, p} that specify continuity domains for the programs t, s, p, and ask also 
for two additional formulas 6/9) and 0g, 1) that specify domains where the value 
of the guard t is 0 and 1, respectively. The target formula 6, and the formulas 
(94) qe{t,s,p,(t,1),(t,0)} are related by two side-conditions. Side-condition (1) con- 
sists of the following four distinct requirements, that must hold for every point a 
in the truth domain of 0: i) a is in the truth domain of at least one of the two for- 
mulas 6;, 9s; ii) if a is not in Aq.) (i-e., we have no guarantee that ¢ will return 1 
at point a, meaning that the program p may be executed) then a must be in the 
continuity domain of p; iii) a condition symmetric to the previous one, replacing 
1 by 0, and p by s; iv) all points of possible discontinuity (i.e. the points a such 
that 6t does not hold) must be in the continuity domain of both s and p, and as 
a consequence both 6° and 6? must hold there. The side-condition (2) uses typed 
contextual equivalence =%® between terms to express that the two programs s 
and p must coincide on all inputs such that 6; does not hold-i.e. that are not 
in the continuity domain of t. Observe that typed context equivalence here is 
defined with respect to the system of simple types. 


Notation 1. We use the following notations in Figure 5. When I is a typing 
environement, we write GI and HI’ for the ground and higher-order parts of 
I’, respectively. Moreover, suppose we have a ground refined typing environment 
O = 21: {a E R},..., £n : {an E R}: we say that a logical assignment o is 
compatible with O when {a; | 1 < i < n} C supp(c). When it is the case, 
we build in a natural way the substitution associated to ø along O by taking 
o? (xi) = o (a;i). 
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04~+(8=0V8=1) 


= t:{B eR} bin Tr 
8 (4,0) = (8=0) s(n p 
m t: {BER} rR s:T I te pel (1), (2) 
0,1) (8=1) 
Fg t: {BER} 


If 


(n) 
I F, if t then s else p: T 
Again, the formula w(7) should be read as 7 when T is a higher-order type, and as 
aw ~> 1 when T is a ground type. The side-conditions (1), (2) are given as: 
1. 8s (Gi Vv 6?) A (0%) v gP) A (8) Vv 05) A (0 V (Os A 0»))). 


2. For all logical assignment ø compatible with GI ,o = 6A = implies HI + 
sa CL act pot. 


Fig. 5: Typing Rules for the if-then-else construct 


Example 7. Using our if-then-else typing rule, we can indeed type the program 
in Figure 3b as expected: 


Tee T 


> {G8 ER}. 


H \v.if z <0 then 1 else x+1:{a€R} 


6.4 Open-logical Predicates for Refinement Types 


Our goal in this section is to show the correctness of our refinement type systems, 
that we state below. 


Theorem 3. Lett be any program such that: 


O~+ 0" 
zı : {a1 ER},...,an: {an ER} Fr t: {GER}. 


Then it holds that: 
o [t] (Dom(0)® %7) C Dom(0')f; 
e |t] is sequentially continuous on Dom(@)% > 


As a first step, we show that our if-then-else rule is reasonable, i.e. that it 
behaves well with primitive functions in §. More precisely, if we suppose that 
the functions f,go,gı are such that the premises of the if-then-else rule hold, 
then the program if f(£1,..., £n) then gı(£1,..., £n) else go(%1,...,%n) is 
indeed continuous in the domain specified by the conclusion of the rule. This is 
precisely what we prove in the following lemma. 


Lemma 4. Let f,go,gı : R” — R be functions in §, and O = 21: {a1 €E 
R},..., 2n : {Qn E R}. We denote a the list of logical variables &1,..., an. We 
consider logical formulas 0 and Of, 9(¢,0);9(f,1): Ogo: Pg, that have their logical 
variables in a, and such that: 


76 G. Barthe et al. 


1. f is continuous on Dom(@)™ with f(Dom(0)™) C {0,1} and f(Dom(6¢,»))™) E 
{b} for b e {0,1}. 

2. go and gı are continuous on Dom(¢,4,)%, and Dom(dg,)* respectively, and 
(ai > a1,...;, Oan > an) FAA 767 implies golar, ..., an) = gi(a1,.--,@n); 

3. H0 => (ln v Pgo) A (0; (f,0) v Pg) ^ (Oga) V bgo) A (OF v (bgo ^ $g ))) 

Then it holds that: 


[O F if f(z1,..., £n) then gı(z1,..., 8n) else go(#1,...,2n) : R] 


is continuous on Dom(0)™. 


Proof. The proof can be found in the extended version [7]. 


Similarly to what we did in Section 4, we are going to show Theorem 3 
by way of a logical predicate. Recall that the logical predicate we defined in 
Section 4 consists actually of three kind of predicates—all defined in Definition 1 
of Section 4: FÊ, FÌ, FOT, where O ranges over ground typing environments, 
T ranges over arbitrary environments, and 7 is a type. The first predicate F? 
contains admissible terms t of type O F t: 7, the second predicate FÌ contains 
admissible substitutions 7 that associate to every (x : T) in I a term of type T 
under the typing context ©, and the third predicate F®? contains admissible 
terms t of type L, OF t:r7. 

Here, we need to adapt the three kinds of logical predicates to a refinement 
scenario: first, we replace T and O, I with refinement types and refined typing 
contexts respectively. Moreover, for technical reasons, we also need to generalize 
our typing contexts, by allowing them to be annotated with any subset of R” 
instead of restricting ourselves to those subsets generated by logical formulas. 
Due to this further complexity, we split our definition of logical predicates into 
two: we first define the counterpart of the ground typing context predicate F? 
in Definition 4, then the counterpart of the predicate for substitutions FÌ and 
the counterpart of the predicates F°? for higher-order typing environment in 
Definition 5. 

Let us first see how we can adapt the predicates FÊ? to our refinement types 
setting. Recall that in Section 4, we defined the predicate FE as the collection of 
terms t such that © F t : R, and its semantics [O F t : R] belongs to Ẹ. As we are 
interested in local continuity properties, we need to build a predicate expressing 
local continuity constraints. Moreover, in order to be consistent with our two 
arrow constructs and our two kinds of typing judgments, we actually need to 
consider also two kinds of logical predicates, depending on whether the target 
type we consider is a real type or an higher-order type. We thus introduce the 
following logical predicates: 


C(O,X~¢,F); C(O, X, H); 


where O is a ground typing environment, X is a subset of R”, ¢ is a logical 
formula, and, as usual, F ranges over the real refinements types, while H ranges 
over the higher-order refinement types. As expected, X and @¢ are needed to 
encode continuity constraints inside our logical predicates. 
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Definition 4. Let O be a ground typing context of length n, F and H refined 
ground type and higher-order type, respectively. We define families of predicates 
on terms C(O, Y ~ ġ, F) and C(0,Y,H), with Y C R” and ¢ a logical formula, 
as specified in Figure 6. 


e For F = {a € R} we take: 


C(O,Y ~ v,F) := {t| a1: R,...,an: RE E:R, 
[t] (Y) € Dom(w)* A [t] continuous over Y}. 


e if H is an arrow type of the form H = (H1,..., Hm, {a1 € Ri}, ..., {Qp E R}) ba 


T: 
C(O, Y,H) :={t|ai:R,...,an:RE¢t: A, 
VZ,Vs = (s1,..., Sm) with s; E€ C(O, Z, Hi), 
Vp = (p1,---Pp), Vu" with = YA... AY >y, 
and pj € C(O, Z ~ y’, {aj E€ R}), 
it holds that t(s, p) € C(O, (Y N Z)(n),T)}, 
where as usual we should read Y(n) = Y, (Y N Z)(n) = Y N Z when T is higher- 


order, and w(n) = Y ~ n, (Y N Z)(n) = (YNZ) ~ 7 when T is an annnotated 
real type. 


Fig. 6: Open Logical Predicates for Refinement Types. 


Example 8. We illustrate Definition 4 on some examples. We denote by B° the 
open unit ball in R?, i.e. B° = {(a,b) € R? | a? +b? < 1}. We consider the 
ground typing context O = 2 : {a1 E R}, £2 : {ae E R}. 

e We look first at the predicate C(O, B° ~~ (8 > 0), {8 € R}). It consists of all 
programs z1 : R, £2 : RF t: R such that [zı : R, £2 : RF t: R] is continuous 
on the open unit ball, and takes only strictly positive values there. 

e We look now at an example when the target type T is higher-order. We take 


H = {6 € T € R}, and we look at the logical predicate 
C(O, B°, H). We are going to show that the latter contains, for instance, the 
program: 


t = Aw. f(w, 2} + yf) where f(w,a) = i a 


if a < 1; 0 otherwise. 


Looking at Figure 6, we see that it is enough to check that for any Y C R? 
and any s € C(O, Y ~~ (8; > 0), {81 € R}), it holds that: 


ts € C(O, B° NY ~ (B2 > 0), {82 € RY). 
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Our overall goal—in order to prove Theorem 3—is to show the counterpart 
of the Fundamental Lemma from Section 4 (i.e. Lemma 1), which states that 
the logical predicate FY contains all well-typed terms. This lemma only talks 
about the logical predicates for ground typing contexts, so we can state it as of 
now, but its proof is based on the fact that we dispose of the three predicates. 
Observe that from there, Theorem 3 follows just from the definition of the logical 
predicates on base types. Similarly to what we did for Lemma 1 in Section 4, 
proving it requires to define the logical predicates for substitutions and higher- 
order typing contexts. We do this in Definition 5 below. As before, they consist in 
an adaptation to our refinement types framework of the open logical predicates 
F& and F2" of Section 4: as usual, we need to add continuity annotations, and 
distinguish whether the target type is a ground type or an higher-order type. 


Notation 2. We need to first introduce the following notation: let I, O be two 
ground non-refined typing environments of length m and n respectively-and with 
disjoint support. Let y : supp(I’) > {t| O + t : R} be a substitution. We write 
ly] for the real-valued function: 


hl ‘R” +> Rt" 
a+ (a, [y(21)](@),---, ¥@m)](@)) 


Definition 5. Let O be a ground typing environment of length n, and I an 
arbitrary typing environment. We note n and m the lengths of respectively O 
and GI’. 
e Let Z C R",W C R™™. We define C(O,Z ~~ W,T) as the set of those 
substitutions y : supp(I") > {t | O F t: R} such that: 
e V(x: H) € HT , y(x) € C(O, Z,H), 
e [yer]: R” > R”+™ sends continuously Z into W; 
e Let W CR"™™, F = {a ER} an annotated real type, and w a logical formula 
with Vars(w) C {a}. We define: 


C(L;0), W ~ 4, F) :={t|T,OFt:R 
AVX C R”,vyEeC(O9,X ~ W,T), ty €C(O,X ~= 4,F)}. 
o Let W CR"*™, and H an higher-order refined type. We define : 
C((T;0),W, H) := {t| T,OHÄ-t:H 
AVX C R”, €C(9,X ~ W,T). ty € C(O, X, H)}. 


Example 9. We illustrate Definition 5 on an example. We consider the same 
context O as in Example 8, i.e. O = x1 : {a1 E R}, £2 : {Q2 E€ R}, and we take 
I =23: {a3 € R},z: H, with H = {61 € pj ZOR rg, E€ R}. We are 
interested in the following logical predicate for substitution: 

C(O, B? ~> {(v,|v]) | v € BY)}, T) 


where the norm of the couple (a,b) is taken as: |(a,b)| = va? +b?. We are 
going to build a substitution y : {£3, z} > Ag that belongs to this set. We 
take: 
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o q(z) = Aw. f(w, x] + 23) where f(w,a) = 74 if a < 1,0 otherwise. 

e q(x3) = (V) (zi + 75). 

We can check that the requirements of Definition 5 indeed hold for y: 

e ~(z) € C(O, B°, H)—see Example 8; 

e [ar] : R x R > R? is continuous on B°, and moreover sends B° into 
{(v, |v]) | v € B°)}. Looking at our definition of the semantics of a substitu- 
tion, we see that [y ar] (a,b) = (a,6,|(a,6)|), thus the requirements above 
hold. 


Lemma 5 (Fundamental Lemma). Let O be a ground typing context, and T 
an arbitrary typing conteat-thus I can contain both ground type variables and 
non-ground type variables. 


~~n 
e Suppose thatl’,O F, t: F: thent € C(I; 0, Dom(6) ~ n, F). 
0 
e Suppose that ,O F, t: H: then t € C(L; O, Dom(0), H). 
Proof Sketch. The proof is by induction on the derivation of the refined typing 
judgment. Along the lines, we need to show that our logical predicates play well 


with the underlying denotational semantics, but also with logic. The details can 
be found in the extended version [7]. 


From there, we can finally prove the main result of this section, i.e. Theo- 
rem 3, that states the correctness of our refinement type system. Indeed, Lemma 5 
has Theorem 3 as a corollary: from there it is enough to look at the definition 
of the logical predicate for first-order programs to finally show the correctness 
of our type system. 


7 Related Work 


Logical relations are certainly one of the most well-studied concepts in higher- 
order programming language theory. In their unary version, they have been 
introduced by Tait [54], and further exploited by Girard [33] and Tait [55] him- 
self in giving strong normalization proofs for second-order type systems. The 
relational counterpart of realizability, namely logical relations proper, have been 
introduced by Plotkin [48], and further developed along many different axes, and 
in particular towards calculi with fixpoint constructs or recursive types [3,4,2], 
probabilistic choice [14], or monadic and algebraic effects [34,11,34]. Without 
any hope to be comprehensive, we may refer to Mitchell’s textbook on program- 
ming language theory for a comprehensive account about the earlier, classic 
definitions [43], or to aforementioned papers for more recent developments. 

Extensions of logical relations to open terms have been introduced by several 
authors [39,47,30,53,15] and were explicitly referred to as open logical relations 
in [59]. However, to the best of the authors’ knowledge, all the aforementioned 
works use open logical relations for specific purposes, and do not investigate 
their applicability as a general methodology. 
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Special cases of our Containment Theorem can be found in many papers, 
typically as auxiliary results. As already mentioned, an example is the one of 
higher-order polynomials, whose first-order terms are proved to compute proper 
polynomials in many ways [40,5], none of them in the style of logical relations. 
The Containment Theorem itself can be derived by a previous result by La- 
font [41] (see also Theorem 4.10.7 in [24]). Contrary to such a result, however, 
our proof of the Containment Theorem is entirely syntactical and consists of a 
straightforward application of open logical relations. 

Algorithms for automatic differentiation have recently been extended to higher- 
order programming languages [50,46,51,42,45], and have been investigated from 
a semantical perspective in [16,1] relying on insights from linear logic and deno- 
tational semantics. In particular, the work of Huot et al. [37] provides a deno- 
tational proof of correctness of the program transformation of [50] that we have 
studied in Section 5. 

Continuity and robustness analysis of imperative first-order programs by way 
of program logics is the topic of study of a series of papers by Chaudhuri and 
co-authors [19,18,20]. None of them, however, deal with higher-order programs. 


8 Conclusion and Future Work 


We have showed how a mild variation on the concept of a logical relation can be 
fruitfully used for proving both predicative and relational properties of higher- 
order programming languages, when such properties have a first-order, rather 
than a ground “flavor”. As such, the added value of this contribution is not much 
in the technique itself, but in showing how it is extremely useful in heterogeneous 
contexts, this way witnessing the versatility of logical relations. 

The three case studies, and in particular the correctness of automatic dif- 
ferentiation and refinement type-based continuity analysis, are given as proof- 
of-concepts, but this does not mean they do not deserve to be studied more in 
depth. An example of an interesting direction for future work is the extension 
of our correctness proof from Section 5 to backward propagation differentiation 
algorithms. Another one consists in adapting the refinement type system of Sec- 
tion 6.1 to deal with differentiability. That would of course require a substantial 
change in the typing rule for conditionals, which should take care of checking not 
only continuity, but also differentiability at the critical points. It would also be 
interesting to implement the refinement type system using standard SMT-based 
approaches. Finally, the authors plan to investigate extensions of open logical 
relations to non-normalizing calculi, as well as to non-simply typed calculi (such 
as calculi with polymorphic or recursive types). 
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Abstract. Game Logic is an excellent setting to study proofs-about- 
programs via the interpretation of those proofs as programs, because 
constructive proofs for games correspond to effective winning strategies 
to follow in response to the opponent’s actions. We thus develop Con- 
structive Game Logic, which extends Parikh’s Game Logic (GL) with 
constructivity and with first-order programs à la Pratt’s first-order dy- 
namic logic (DL). Our major contributions include: 1. a novel realizability 
semantics capturing the adversarial dynamics of games, 2. a natural de- 
duction calculus and operational semantics describing the computational 
meaning of strategies via proof-terms, and 3. theoretical results includ- 
ing soundness of the proof calculus w.r.t. realizability semantics, progress 
and preservation of the operational semantics of proofs, and Existential 
Properties enabling the extraction of computational artifacts from game 
proofs. Together, these results provide the most general account of a 
Curry-Howard interpretation for any program logic to date, and the 
first at all for Game Logic. 


Keywords: Game Logic, Constructive Logic, Natural Deduction, Proof Terms 


1 Introduction 


Two of the most essential tools in theory of programming languages are program 
logics, such as Hoare calculi [29] and dynamic logics [45], and the Curry-Howard 
correspondence [17,31], wherein propositions correspond to types, proofs to func- 
tional programs, and proof term normalization to program evaluation. Their 
intersection, the Curry-Howard interpretation of program logics, has received 
surprisingly little study. We undertake such a study in the setting of Game 
Logic (GL) [38], because this leads to novel insights, because the Curry-Howard 
correspondence can be explained particularly intuitively for games, and because 
our first-order GL is a superset of common logics such as first-order Dynamic 
Logic (DL). 

Constructivity and program verification have met before: Higher-order con- 
structive logics [16] obey the Curry-Howard correspondence and are used to 
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develop verified functional programs. Program logics are also often embedded 
in constructive proof assistants such as Coq [48], inheriting constructivity from 
their metalogic. Both are excellent ways to develop verified software, but we 
study something else. 


We study the computational content of a program logic itself. Every funda- 
mental concept of computation is expected to manifest in all three of logic, type 
systems, and category theory |27]. Because dynamics logics (DL’s) such as GL 
have shown that program execution is a first-class construct in modal logic, the 
theorist has an imperative to explore the underlying notion of computation by 
developing a constructive GL with a Curry-Howard interpretation. 


The computational content of a proof is especially clear in GL, which gen- 
eralizes DL to programmatic models of zero-sum, perfect-information games be- 
tween two players, traditionally named Angel and Demon. Both normal-play and 
misére-play games can be modeled in GL. In classical GL, the diamond modality 
(a) and box modality [a]¢ say that Angel and Demon respectively have a strat- 
egy to ensure ¢ is true at the end of a, which is a model of a game. The difference 
between classical GL and CGL is that classical GL allows proofs that exclude the 
middle, which correspond to strategies which branch on undecidable conditions. 
CGL proofs can branch only on decidable properties, thus they correspond to 
strategies which are effective and can be executed by computer. Effective strate- 
gies are crucial because they enable the synthesis of code that implements a 
strategy. Strategy synthesis is itself crucial because even simple games can have 
complicated strategies, and synthesis provides assurance that the implementa- 
tion correctly solves the game. A GL strategy resolves the choices inherent in a 
game: a diamond strategy specifies every move made by the Angel player, while 
a box strategy specifies the moves the Demon player will make. 


In developing Constructive Game Logic (CGL), adding constructivity is a 
deep change. We provide a natural deduction calculus for CGL equipped with 
proof terms and an operational semantics on the proofs, demonstrating the mean- 
ing of strategies as functional programs and of winning strategies as functional 
programs that are guaranteed to achieve their objective no matter what counter- 
strategy the opponent follows. While the proof calculus of a constructive logic 
is often taken as ground truth, we go a step further and develop a realizability 
semantics for CGL as programs performing winning strategies for game proofs, 
then prove the calculus sound against it. We adopt realizability semantics in 
contrast to the winning-region semantics of classical GL because it enables us 
to prove that CGL satisfies novel properties (Section 8). The proof of our Strat- 
egy Property (Theorem 2) constitutes an (on-paper) algorithm that computes a 
player’s (effective) strategy from a proof that they can win a game. This is the 
key test of constructivity for CGL, which would not be possible in classical GL. We 
show that CGL proofs have two computational interpretations: the operational 
semantics interpret an arbitrary proof (strategy) as a functional program which 
reduces to a normal-form proof (strategy), while realizability semantics interpret 
Angel strategies as programs which defeat arbitrary Demonic opponents. 
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While CGL has ample theoretical motivation, the practical motivations from 
synthesis are also strong. A notable line of work on dGL extends first-order GL 
to hybrid games to verify safety-critical adversarial cyber-physical systems [42]. 
We have designed CGL to extend smoothly to hybrid games, where synthesis 
provides the correctness demanded by safety-critical systems and the synthesis 
of correct monitors of the external world [36]. 


2 Related Work 


This work is at the intersection of game logic and constructive modal logics. 
Individually, they have a rich literature, but little work has been done at their 
intersection. Of these, we are the first for GL and the first with a proofs-as- 
programs interpretation for a full first-order program logic. 


Games in Logic. Parikh’s propositional GL [38] was followed by coalitional 
GL [39]. A first-order version of GL is the basis of differential game logic dGL [42] 
for hybrid games. GL’s are unique in their clear delegation of strategy to the proof 
language rather than the model language, crucially allowing succinct game spec- 
ifications with sophisticated winning strategies. Succinct specifications are im- 
portant: specifications are trusted because proving the wrong theorem would not 
ensure correctness. Relatives without this separation include Strategy Logic [15], 
Alternating-Time Temporal Logic (ATL) [5], CATL [30], Ghosh’s SDGL [24], 
Ramanujam’s structured strategies [46], Dynamic-epistemic logics [6,10,49], ev- 
idence logics [9], and Angelic Hoare logic [35]. 


Constructive Modal Logics. A major contribution of CGL is our constructive 
semantics for games, not to be confused with game semantics [1], which are used 
to give programs semantics in terms of games. We draw on work in semantics 
for constructive modal logics, of which two main approaches are intuitionistic 
Kripke semantics and realizability semantics. 

An overview of Intuitionistic Kripke semantics is given by Wijesekera [52]. 
Intuitionistic Kripke semantics are parameterized over worlds, but in contrast 
to classical Kripke semantics, possible worlds represent what is currently known 
of the state. Worlds are preordered by wı > wz when wy, contains at least 
the knowledge in w2. Kripke semantics were used in Constructive Concurrent 
DL [53], where both the world and knowledge of it change during execution. A 
key advantage of realizability semantics [37,33] is their explicit interpretation of 
constructivity as computability by giving a realizer, a program which witnesses 
a fact. Our semantics combine elements of both: Strategies are represented by 
realizers, while the game state is a Kripke world. Constructive set theory [2] aids 
in understanding which set operations are permissible in constructive semantics. 

Modal semantics have also exploited mathematical structures such as: i) Neigh- 
borhood models [8], topological models for spatial logics [7], and temporal log- 
ics of dynamical systems [20]. ii) Categorical [3], sheaf [28], and pre-sheaf [23] 
models. iii) Coalgebraic semantics for classical Propositional Dynamic Logic 
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(PDL) [19]. While games are known to exhibit algebraic structure [25], such 
laws are not essential to this work. Our semantics are also notable for the seam- 
less interaction between a constructive Angel and a classical Demon. 

CGL is first-order, so we must address the constructivity of operations that 
inspect game state. We consider rational numbers so that equality is decidable, 
but our work should generalize to constructive reals [11,13]. 

Intuitionistic modalities also appear in dynamic-epistemic logic (DEL) [21], 
but that work is interested primarily in proof-theoretic semantics while we em- 
ploy realizability semantics to stay firmly rooted in computation. Intuitionistic 
Kripke semantics have also been applied to multimodal System K with itera- 
tion [14], a weak fragment of PDL. 


Constructivity and Dynamic Logic. With CGL, we bring to fruition several past 
efforts to develop constructive dynamic logics. Prior work on PDL [18] sought 
an Existential Property for Propositional Dynamic Logic (PDL), but they ques- 
tioned the practicality of their own implication introduction rule, whose side 
condition is non-syntactic. One of our results is a first-order Existential Prop- 
erty, which Degen cited as an open problem beyond the methods of their day [18]. 
To our knowledge, only one approach [32] considers Curry-Howard or functional 
proof terms for a program logic. While their work is a notable precursor to 
ours, their logic is a weak fragment of PDL without tests, monotonicity, or un- 
bounded iteration, while we support not only PDL but the much more powerful 
first-order GL. Lastly, we are preceded by Constructive Concurrent Dynamic 
Logic, [53] which gives a Kripke semantics for Concurrent Dynamic Logic [41], 
a proper fragment of GL. Their work focuses on an epistemic interpretation of 
constructivity, algebraic laws, and tableaux. We differ in our use of realizability 
semantics and natural deduction, which were essential to developing a Curry- 
Howard interpretation for CGL. In summary, we are justified in claiming to have 
the first Curry-Howard interpretation with proof terms and Existential Proper- 
ties for an expressive program logic, the first constructive game logic, and the 
only with first-order proof terms. 

While constructive natural deduction calculi map most directly to functional 
programs, proof terms can be generated for any proof calculus, including a well- 
known interpretation of classical logic as continuation-passing style [26]. Proof 
terms have been developed [22] for a Hilbert calculus for dL, a dynamic logic 
(DL) for hybrid systems. Their work focuses on a provably correct interchange 
format for classical dL proofs, not constructive logics. 


3 Syntax 


We define the language of CGL, consisting of terms, games, and formulas. The 
simplest terms are program variables x,y E€ V where V is the set of variable 
identifiers. Globally-scoped mutable program variables contain the state of the 
game, also called the position in game-theoretic terminology. All variables and 
terms are rational-valued (Q); we also write B for the set of Boolean values {0, 1} 
for false and true respectively. 
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Definition 1 (Terms). A term f,g is a rational-valued computable function 
over the game state. We give a nonexhaustive grammar of terms, specifically 
those used in our examples: 


fg = ->|ale|f+el feo | f/g | f mode 


where q E€ Q is a rational literal, x a program variable, f+g a sum, f-g a product. 
Division-with-remainder is intended for use with integers, but we generalize the 
standard notion to support rational arguments. Quotient f/g is integer even 
when f and g are non-integer, and thus leaves a rational remainder f mod g. 
Divisors g are assumed to be nonzero. 


A game in CGL is played between a constructive player named Angel and a 
classical player named Demon. Our usage of the names Angel and Demon differs 
subtly from traditional GL usage for technical reasons. Our Angel and Demon 
are asymmetric: Angel is “our” player, who must play constructively, while the 
“opponent” Demon is allowed to play classically because our opponent need not 
bea computer. At any time some player is active, meaning their strategy resolves 
all decisions, and the opposite player is called dormant. Classical GL identifies 
Angel with active and Demon with dormant; the notions are distinct in CGL. 


Definition 2 (Games). The set of games a, 8 is defined recursively as such: 
a,8 == ?@|a:=f|a:=*|aUB| asf | a* | at 


In the test game ?¢, the active player wins if they can exhibit a constructive 
proof that formula ¢ currently holds. If they do not exhibit a proof, the dormant 
player wins by default and we informally say the active player “broke the rules”. 
In deterministic assignment games x := f, neither player makes a choice, but 
the program variable x takes on the value of a term f. In nondeterministic 
assignment games x := *, the active player picks a value for x : Q. In the choice 
game aU 8, the active player chooses whether to play game a or game 8. In 
the sequential composition game a;(, game a is played first, then 8 from the 
resulting state. In the repetition game a*, the active player chooses after each 
repetition of œ whether to continue playing, but loses if they repeat a infinitely. 
Notably, the exact number of repetitions can depend on the dormant player’s 
moves, so the active player need not know, let alone announce, the exact number 
of iterations in advance. In the dual game a“, the active player becomes dormant 
and vice-versa, then a is played. We parenthesize games with braces {a} when 
necessary. Sequential and nondeterministic composition both associate to the 
right, i.e., aU BUY = {aU {GU y}}. This does not affect their semantics as both 
operators are associative, but aids in reading proof terms. 


Definition 3 (CGL Formulas). The set of CGL formulas ¢ (also w, p) is given 
recursively by the grammar: 


p == (a)ollalol fr~g 
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The defining constructs in CGL (and GL) are the modalities (a)¢ and [a]¢. 
These mean that the active or dormant Angel (i.e., constructive) player has a 
constructive strategy to play œ and achieve postcondition ¢. This paper does 
not develop the modalities for active and dormant Demon (i.e., classical) players 
because by definition those cannot be synthesized to executable code. We assume 
the presence of interpreted comparison predicates ~ € {<,<,=,4,>,>}. 

The standard connectives of first-order constructive logic can be derived from 
games and comparisons. Verum (tt) is defined 1 > 0 and falsum (ff) is 0 > 1. 
Conjunction ¢ A w is defined (?¢4)w, disjunction ¢ V W is defined (?dU?w)tt, 
implication ¢ > w is defined [?¢]W, universal quantification Vx is defined 
[x := x], and existential quantification dx @ is defined (æ := x). As usual in 
logic, equivalence ¢ + y% can also be defined (¢ > y) A (Yy > ¢). As usual in 
constructive logics, negation 7@ is defined ¢ —> ff, and inequality is defined 
by f Æ g = -(f = g). We will use the derived constructs freely but present 
semantics and proof rules only for the core constructs to minimize duplication. 
Indeed, it will aid in understanding of the proof term language to keep the 
definitions above in mind, because the proof terms for many first-order programs 
follow those from first-order constructive logic. 

For convenience, we also write derived operators where the dormant player 
is given control of a single choice before returning control to the active player. 
The dormant choice aN p, defined {at U 67}4, says the dormant player chooses 
which branch to take, but the active player is in control of the subgames. We 
write o% (likewise for a and f) for the renaming of x for y and vice versa in 
formula ¢, and write ¢f for the substitution of term f for program variable x in 
@, if the substitution is admissible (Def. 9 in Section 6). 


3.1 Example Games 


We demonstrate the meaning and usage of the CGL constructs via examples, 
culminating in the two classic games of Nim and cake-cutting. 


Nondeterministic Programs. Every (possibly nondeterministic) program is also 
a one-player game. For example, the program n := 0; {n := n + 1}* can nonde- 
terministically sets n to any natural number because Angel has a choice whether 
to continue after every repetition of the loop, but is not allowed to continue 
forever. Conversely, games are like programs where the environment (Demon) is 
adversarial, and the program (Angel) strategically resolves nondeterminism to 
overcome the environment. 


Demonic Counter. Angel’s choices often must be reactive to Demon’s choices. 
Consider the game c := 10; {c := c — 1 N c := c — 2}";?0 < c < 2 where Demon 
repeatedly decreases c by 1 or 2, and Angel chooses when to stop. Angel only wins 
because she can pass the test 0 < c < 2, which she can do by simply repeating 
the loop until 0 < c < 2 holds. If Angel had to decide the loop duration in 
advance, Demon could force a rules violation by “guessing” the duration and 
changing his choices of c:=c— 1 vs. c:=c— 2. 
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Coin Toss. Games are perfect-information and do not possess randomness in the 
probabilistic sense, only (possibilistic) nondeterminism. This standard limitation 
is shown by attempting to express a coin-guessing game: 


{coin := 0 N coin := 1}; {guess := 0 U guess := 1}; ?guess = coin 


The Demon player sets the value of a tossed coin, but does so adversarially, 
not randomly, since strategies in CGL are pure strategies. The Angel player has 
perfect knowledge of coin and can set guess equivalently, thus easily passing 
the test guess = coin, unlike a real coin toss. Partial information games are 
interesting future work that could be implemented by limiting the variables 
visible in a strategy. 


Nim. Nim is the standard introductory example of a discrete, 2-player, zero- 
sum, perfect-information game. We consider misère play (last player loses) for 
a version of Nim that is also known as the subtraction game. The constant NIM 
defines the game Nim. 


Nim = { {{e:=¢- 1Uc:=6- 2U c= — 3}; 2c > 0}; 


{{e:5¢-1Uc:=¢-2Uc:=¢~ 3}; 2 > OF} 


The game state consists of a single counter c containing a natural number, which 
each player chooses (U) to reduce by 1, 2, or 3 (c:= c — k). The counter is non- 
negative, and the game repeats as long as Angel wishes, until some player empties 
the counter, at which point that player is declared the loser (?c > 0). 


Proposition 1 (Dormant winning region). Suppose c = 1 (mod 4), Then 
the dormant player has a strategy to ensure c= 1 (mod 4) as an invariant. That 
is, the following CGL formula is valid (true in every state): 


c > 0 —> c mod4=1- [NIM*]c mod 4= 1 

This implies the dormant player wins the game because the active player 
violates the rules once c = 1 and no move is valid. We now state the winning 
region for an active player. 
Proposition 2 (Active winning region). Suppose c € {0,2,3} (mod 4) ini- 
tially, and the active player controls the loop duration. Then the active player 
can achieve c € {2,3,4}: 

c > 0 —> c mod 4 € {0,2,3} > (Nm*) c € {2,3,4} 


At that point, the active player will win in one move by setting c = 1 which 
forces the dormant player to set c = 0 and fail the test ?c > 0. 
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Cake-cutting. Another classic 2-player game, from the study of equitable divi- 
sion, is the cake-cutting problem [40]: The active player cuts the cake in two, 
then the (initially-)dormant player gets first choice of a piece. This is an optimal 
protocol for splitting the cake in the sense that the active player is incentivized 
to split the cake evenly, else the dormant player could take the larger piece. 
Cake-cutting is also a simple use case for fractional numbers. The constant CC 
defines the cake-cutting game. Here x is the relative size (from 0 to 1) of the 
first piece, y is the size of the second piece, a is the size of the active player’s 
piece, and d is the size of dormant player’s piece. 


CC H=a:=*;2?(0<a<l)y:=1-2; 
{a:=a;d:=yNa:=y;d:=2} 


The game is played only once. The active player picks the division of the cake, 
which must be a fraction 0 < x < 1. The dormant player then picks which slice 
goes to whom. 

The active player has a tight strategy to achieve a 0.5 cake share, as stated 
in Proposition 3. 


Proposition 3 (Active winning region). The following formula is valid: 
(CC) a> 0.5 


The dormant player also has a computable strategy to achieve exactly 0.5 
share of the cake (Proposition 4). Division is fair because each player has a 
strategy to get their fair 0.5 share. 


Proposition 4 (Dormant winning region). The following formula is valid: 
[CC]d > 0.5 


Computability and Numeric Types. Perfect fair division is only achieved for a, d € 
Q because rational equality is decidable. Trichotomy (a < 0.5Va = 0.5Va > 0.5) 
is a tautology, so the dormant player’s strategy can inspect the active player’s 
choice of a. Notably, we intend to support constructive reals in future work, for 
which exact equality is not decidable and trichotomy is not an axiom. Future 
work on real-valued CGL will need to employ approximate comparison techniques 
as is typical for constructive reals [11,13,51]. The examples in this section have 
been proven [12] using the calculus defined in Section 5. 


4 Semantics 


We now develop the semantics of CGL. In contrast to classical GL, whose seman- 
tics are well-understood [38], the major semantic challenge for CGL is capturing 
the competition between a constructive Angel and classical Demon. We base 
our approach on realizability semantics [37,33], because this approach makes the 
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relationship between constructive proofs and programs particularly clear, and 
generating programs from CGL proofs is one of our motivations. 

Unlike previous applications of realizability, games feature two agents, and 
one could imagine a semantics with two realizers, one for each of Angel and 
Demon. However, we choose to use only one realizer, for Angel, which captures 
the fact that only Angel is restricted to a computable strategy, not Demon. 
Moreover, a single realizer makes it clear that Angel cannot inspect Demon’s 
strategy, only the game state, and also simplifies notations and proofs. Because 
Angel is computable but Demon is classical, our semantics has the flavor both 
of realizability semantics and of a traditional Kripke semantics for programs. 

The semantic functions employ game states w E€ © where we write G for 
the set of all states. We additionally write T, 1. € G (not to be confused with 
formulas tt and ff) for the pseudo-states T and indicating that Angel or 
Demon respectively has won the game early by forcing the other to fail a test. 
Each w € G maps each x € V to a value w(x) € Q. We write w} for the state 
that agrees with w except that x is assigned value v where v € Q. 


Definition 4 (Arithmetic term semantics). A term f is a computable func- 
tion of the state, so the interpretation [f]w of term f in state w is f(w). 


4.1 Realizers 


To define the semantics of games, we first define realizers, the programs which 
implement strategies. The language of realizers is a higher-order lambda calculus 
where variables can range over game states, numbers, or realizers which realize a 
give proposition ¢. Gameplay proceeds in continuation-passing style: invoking a 
realizer returns another realizer which performs any further moves. We describe 
the typing constraints for realizers informally, and say a is a (a) ¢-realizer (a € 
(a) Rz) if it provides strategic decisions exactly when (a)¢ demands them. 


Definition 5 (Realizers). The syntax of realizers a,b,c E€ Rz (where Rz is 
the set of all realizers) is defined coinductively: 


a,b,c ::=x | | (a,b) | te(a) | Tr(a) | (Aw: 6. a(w)) | (Av: Q. a) 
| (Ar: Rz. a) | av |ab| aw | if (f(w))a else b 


where z is a program (or realizer) variable and f is a term over the state w. The 
Roman a,b,c should not be confused with the Greek a, 8,y which range over 
games. Realizers have access to the game state w, expressed by lambda realizers 
(Aw : G. a(w)) which, when applied in a state v, compute the realizer a with 
v substituted for w. State lambdas A are distinguished from propositional and 
first-order lambdas A. The unit realizer () makes no choices and is understood 
as a unit tuple. Units () realize f ~ g because rational comparisons, in contrast 
to real comparisons, are decidable. Conditional strategic decisions are realized 
by if (f(w)) a else b for computable function f : © > B, and execute a if f 
returns truth, else b. Realizer (Aw : ©. f(w)) is a (aU 8)¢-realizer if f(w) € 
({0} x (aJo Rz) U ({1} x (BJA Rz) for all w. The first component determines 
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which branch is taken, while the second component is a continuation which 
must be able to play the corresponding branch. Realizer (Aw : G. f(w)) can also 
be a (x:=*)@-realizer, which requires f(w) E€ Q x (Rz) for all w. The first 
component determines the value of x while the second component demonstrates 
the postcondition ¢. The pair realizer (a,b) realizes both Angelic tests (?¢)w and 
dormant choices [a U §]¢. It is identified with a pair of realizers: (a,b) E RZX RZ. 

A dormant realizer waits and remembers the active Demon’s moves, because 
they typically inform Angel’s strategy once Angel resumes action. The first-order 
realizer (Ax : Q. b) is a [x :=*]¢-realizer when b}, is a ¢-realizer for every v € Q; 
Demon tells Angel the desired value of x, which informs Angel’s continuation b. 
The higher-order realizer (Ax : ¢Rz. b) realizes [?¢]¢) when b$ realizes w for every 
g-realizer c. Demon announces the realizer for ¢ which Angel’s continuation b 
may inspect. Tuples are inspected with projections mz(a) and 7r(a). A lambda 
is inspected by applying arguments aw for state-lambdas, av for first-order, 
and ab for higher-order. Realizers for sequential compositions (a; 3)¢ (likewise 
la; 8]ġ) are (a) (3) d-realizers: first a is played, and in every case the continuation 
must play 8 before showing @. Realizers for repetitions a* are streams containing 
a-realizers, possibly infinite by virtue of coinductive syntax. Active loop realizer 
ind(x. a) is the least fixed point of the equation b = [b/a]a, i.e., x is a recursive 
call which must be invoked only in accordance with some well-order. We realize 
dormant loops with gen(a, x.b, x.c), coinductively generated from initial value 
a, update b, and post-step c with variable x for current generator value. 

Active loops must terminate, so (a*)¢-realizers are constructed inductively 
using any well-order on states. Dormant loops must be played as long as the 
opponent wishes, so [a*|¢-realizers are constructed coinductively, with the in- 
variant that @ has a realizer at every iteration. 


4.2 Formula and Game Semantics 


A state w paired with a realizer a that continues the game is called a possibility. 
A region (written X,Y, Z) is a set of possibilities. We write |] C Rz x G for 
the region which realizes formula ¢. A formula ¢ is valid iff some a uniformly 
realizes every state, i.e., {a} x G C fọ]. A sequent T F ¢ is valid iff the formula 
AT > ¢is valid, where A I is the conjunction of all assumptions in I’. 

The game semantics are region-oriented, i.e., they process possibilities in 
bulk, though Angel commits to a strategy from the start. The region X((a)) : 
o(Rz x ©) is the union of all end regions of game a which arise when ac- 
tive Angel commits to an element of X, then Demon plays adversarially. In 
X [al] : e(RzxG) Angel is the dormant player, but it is still Angel who commits 
to an element of X and Demon who plays adversarially. Recall that pseudo-states 
T and represent early wins by each Angel and Demon, respectively. The defini- 
tions below implicitly assume L, T ¢ X, they extend to the case L € X (likewise 
T € X) using the equations (X U {L} [la] = X[[a]] U{L} and (X U {L})(a)) = 
X((a)) U{L}. That is, if Demon has already won by forcing an Angel violation 
initially, any remaining game can be skipped with an immediate Demon victory, 
and vice-versa. The game semantics exploit the Angelic projections Z,9), Z,1) 
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and Demonic projections Zio}, Zi]; which represent binary decisions made by 
a constructive Angel and a classical Demon, respectively. The Angelic projec- 
tions, which are defined Zo) = {(7r(a),w) | me (a)(w) = 0, (a,w) € Z} and 
Zay = {(tr(a),w) | tr(a)(w) = 1,(a,w) € Z}, filter by which branch Angel 
chooses with mz(a)(w) € B, then project the remaining strategy mr(a). The 
Demonic projections, which are defined Zjo) = {(7z(a),w) | (a,w) € Z} and 
Zn) = {(Tr(a),w) | (a,w) E€ Z}, contain the same states as Z, but project the 
realizer to tell Angel which branch Demon took. 


Definition 6 (Formula semantics). [¢] C Rz x G is defined as: 


(O,w) € [F ~ g] a Ile ~ [gle 
(a,w) € Kajo] if ilaw) a) E MUTH 
(a, w) € [lole] if {(a,~) lal E dluh 


Comparisons f ~ g defer to the term semantics, so the interesting cases are 
the game modalities. Both [a]¢ and (a)¢ ask whether Angel wins a by following 
the given strategy, and differ only in whether Demon vs. Angel is the active 
player, thus in both cases every Demonic choice must satisfy Angel’s goal, and 
early Demon wins are counted as Angel losses. 


(e 
c 


Definition 7 (Angel game forward semantics). We inductively define the 
region X((a)) : p(Rz x ©) in which a can end when active Angel plays X: 
X4) = {(Trla) w) | (mL(a),w) € J] for some (a,w) € X } 
U {41 | (nz(a) w) £ [] for all (a,w) E€ X } 


X (a := f) = {(a w”) | (aw) € X} 

X (w= +) = {(tr(a), wx?) | (aw) € X} 
X (a; b) = (X(a))) (5) 

X (aU pY = Xo) Ka) U Xa (8) 


X(a*) = (KZo E Rz x 6 | XU (Bay (a) € Z} 
X (a) = Xlo] 
Definition 8 (Demon game forward semantics). We inductively define the 
region X [lal] : e(Rz x G) in which a can end when dormant Angel plays X: 
X([?¢]] = {(ab,w) | (a,w) E€ X, (b,w) € [ġ], some b € Ra} 
U{T | (a,w) E€ X, but no (b, w) € fel} 


X [jæ := f] = {(a, w1”) | (a,w) € X} 
Xle :=+] = {(ar,w7) | r € Q} 


= Xqlle] U X [A] 
=( {Z| E Rz x 6 | XU (Zpylla])) € Z} 
X [la] = X (a) 


f 
X Ile; 8] = (Tal) 6] 
B 
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Angelic tests ?¢ end in the current state w with remaining realizer 7p(a) if 
Angel can realize ¢ with 7,(a), else end in L. Angelic deterministic assignments 
consume no realizer and simply update the state, then end. Angelic nondeter- 
ministic assignments x := * ask the realizer mz (a) to compute a new value for x 
from the current state. Angelic compositions q; first play a, then 6 from the 
resulting state using the resulting continuation. Angelic choice games aU 8 use 
the Angelic projections to decide which branch is taken according to mz (a). The 
realizer 7p(a) may be reused between a and £, since mgR(a) could just invoke 
mz(a) if it must decide which branch has been taken. This definition of Angelic 
choice (corresponding to constructive disjunction) captures the reality that re- 
alizers in CGL, in contrast with most constructive logics, are entitled to observe 
a game state, but they must do so in computable fashion. 


Repetition Semantics. In any GL, the challenge in defining the semantics of 
repetition games a* is that the number of iterations, while finite, can depend 
on both players’ actions and is thus not known in advance, while the DL-like 
semantics of a* as the finite reflexive, transitive closure of a gives an advance- 
notice semantics. Classical GL provides the no-advance-notice semantics as a 
fixed point [38], and we adopt the fixed point semantics as well. The Angelic 
choice whether to stop (Z(g)) or iterate the loop (Z,)) is analogous to the case 
for a U £. 


Duality Semantics. To play the dual game af, the active and dormant players 
switch roles, then play a. In classical GL, this characterization of duality is inter- 
changeable with the definition of af as the game that Angel wins exactly when 
it is impossible for Angel to lose. The characterizations are not interchangeable 
in CGL because the Determinacy Axiom (all games have winners) of GL is not 
valid in CGL: 


Remark 1 (Indeterminacy). Classically equivalent determinacy axiom schemata 
a(a)nd > [a]ġ and (a)7¢ V [a]¢@ of classical GL are not valid in CGL, because 
they imply double negation elimination. 


Remark 2 (Classical duality). In classical GL, Angelic dual games are character- 
ized by the axiom schema (a“)¢ + 7(a)7¢, which is not valid in in CGL. It is 
classically interdefinable with (a) © [a]@. 


The determinacy axiom is not valid in CGL, so we take (af) + [a]@ as primary. 


4.3 Demonic Semantics 


Demon wins a Demonic test by presenting a realizer b as evidence that the 
precondition holds. If he cannot present a realizer (i.e., because none exists), then 
the game ends in T so Angel wins by default. Else Angel’s higher-order realizer a 
consumes the evidence of the pre-condition, i.e., Angelic strategies are entitled to 
depend (computably) on how Demon demonstrated the precondition. Angel can 
check that Demon passed the test by executing b. The Demonic repetition game 
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a* is defined as a fixed-point [42] with Demonic projections. Computationally, 
a winning invariant for the repetition is the witness of its winnability. 


The remaining cases are innocuous by comparison. Demonic deterministic 
assignments x := f deterministically store the value of f in x, just as Angelic 
assignments do. In demonic nondeterministic assignment x := *, Demon chooses 
to set x to any value. When Demon plays the choice game aU 8, Demon chooses 
classically between a and 8. The dual game af is played by Demon becoming 
dormant and Angel become active in a. 


Semantics Examples. The realizability semantics of games are subtle on a first 
read, so we provide examples of realizers. In these examples, the state argument 
w is implicit, and we refer to w(x) simply as x for brevity. 


Recall that [?¢]~ and ¢ —> ~ are equivalent. For any ¢, the identity function 
(Ax : Rz. x) is a @ > ¢-realizer: for every ¢-realizer x which Demon presents, 
Angel can present the same x as evidence of ¢. This confirms expected behavior 
per propositional constructive logic: the identity function is the proof of self- 
implication. 


In example formula (x := «7; {x := a U x :=—2x})ax > 0, Demon gets to set x, 
then Angel decides whether to negate x in order to make it nonnegative. It is 
realized by Az : Q. ((if (x < 0) 1 else 0), ()): Demon announces the value of 
x, then Angel’s strategy is to check the sign of x, taking the right branch when 
x is negative. Each branch contains a deterministic assignment which consumes 
no realizer, then the postcondition x > 0 has trivial realizer (). 


Consider the formula ({a:=2+1}")a > y, where Angel’s winning strategy 
is to repeat the loop until z > y, which will occur as x increases. The realizer 
is ind(w. (if (x > y) (0, O) else (1,w), O )), which says that Angel stops the 
loop if x > y and proves the postcondition with a trivial strategy. Else Angel 
continues the loop, whose body consumes no realizer, and supplies the inductive 
call w to continue the strategy inductively. 


Consider the formula [?x > 0;{z:=2+1}"]Aay(y < x ^y > 0) for a subtle 
example. Our strategy for Angel is to record the initial value of x in y, then 
maintain a proof that y < x as x increases. This strategy is represented by Aw : 
(x > 0) Raz. gen((z,(O,w)), z-(77(z),(O,7r(tr(z)))), z.z). That is, initially 
Demon announces a proof w of x > 0. Angel specifies the initial element of the 
realizer stream by witnessing dy (y < Ay > 0) with co = (x, (O, w)), where the 
first component instantiates y = x, the trivial second component indicates that 
y < y trivially, and the third component reuses w as a proof of y > 0. Demon can 
choose to repeat the loop arbitrarily. When Demon demands the k’th repetition, 
z is bound to ck—ı to compute ck = (tz(z),(O,7Rr(7R(z)))), which plays the 
next iteration. That is, at each iteration Angel witnesses Jy (y < x A y > 0) by 
assigning the same value (stored in 7, (z)) to y, reproving y < x with (), then 
reusing the proof (stored in tr(mR(z))) that y > 0. 
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5 Proof Calculus 


Having settled on the meaning of a game in Section 4, we proceed to develop a 
calculus for proving CGL formulas syntactically. The goal is twofold: the practical 
motivation, as always, is that when verifying a concrete example, the realizabil- 
ity semantics provide a notion of ground truth, but are impractical for proving 
large formulas. The theoretical motivation is that we wish to expose the compu- 
tational interpretation of the modalities (a)¢ and [a]¢@ as the types of the players’ 
respective winning strategies for game a that has ¢ as its goal condition. Since 
CGL is constructive, such a strategy constructs a proof of the postcondition ¢. 

To study the computational nature of proofs, we write proof terms explicitly: 
the main proof judgement I’ M :¢ġ says proof term M is a proof of ¢ in context 
I’, or equivalently a proof of sequent (I + ¢). We write M, N,O (sometimes 
A, B,C) for arbitrary proof terms, and p,q, ¢,r,s,g for proof variables, that is 
variables that range over proof terms of a given proposition. In contrast to the 
assignable program variables, the proof variables are given their meaning by 
substitution and are scoped locally, not globally. We adapt propositional proof 
terms such as pairing, disjoint union, and lambda-abstraction to our context of 
game logic. To support first-order games, we include first-order proof terms and 
new terms for features: dual, assignment, and repetition games. 

We now develop the calculus by starting with standard constructs and work- 
ing toward the novel constructs of CGL. The assumptions p in I’ are named, 
so that they may appear as variable proof-terms p. We write [4 and M# for 
the renaming of program variable x to y and vice versa in context I" or proof 
term M, respectively. Proof rules for state-modifying constructs explicitly perform 
renamings, which both ensures they are applicable as often as possible and also 
ensures that references to proof variables support an intuitive notion of lexical 
scope. Likewise lf and Mf are the substitutions of term f for program variable 
x. We use distinct notation to substitute proof terms for proof variables while 
avoiding capture: [N/p|M substitutes proof term N for proof variable p in proof 
term M. Some proof terms such as pairs prove both a diamond formula and a 
box formula. We write (M, N} and [M, N] respectively to distinguish the terms 
or (M, N) to treat them uniformly. Likewise we abbreviate (ajọ when the same 
rule works for both diamond and box modalities, using [(a)]¢ to denote its dual 
modality. The proof terms (x := f¥% in p. M) and [x := f# in p. M] introduce 
an auxiliary ghost variable y for the old value of x, which improves completeness 
without requiring manual ghost steps. 

The propositional proof rules of CGL are in Fig. 1. Formula [?¢é]w is construc- 
tive implication, so rule [?|E with proof term M N eliminates M by supplying 
an N that proves the test condition. Lambda terms (Ap: ¢. M) are introduced 
by rule [?|I by extending the context I’. While this rule is standard, it is worth 
emphasizing that here p is a proof variable for which a proof term (like N in [?]E) 
may be substituted, and that the game state is untouched by [?]I. Constructive 
disjunction (between the branches (a)@ and (3)¢) is the choice (aU 3)¢. The 
introduction rules for injections are (U)I1 and (U)I2, and case-analysis is per- 
formed with rule (U)E, with two branches that prove a common consequence 
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[?]E 


TE (M N):o 


Fig. 1. CGL proof calculus: Propositional rules 


from each disjunct. The cases (?¢)w and [a U 6]¢ are conjunctive. Conjunctions 
are introduced by (?)I and [UJI as pairs, and eliminated by (?)E1, (?)E2, [UJE1, 
and |[U]E2 as projections. Lastly, rule hyp says formulas in the context hold by 
assumption. 

We now begin considering non-propositional rules, starting with the simplest 
ones. The majority of the rules in Fig.2, while thoroughly useful in proofs, 


TRA Ha*)d Tys:¢F Bid T,g: lay a*jo F Cry 
Tt (case, A of s > Bl gsC): 
r- M:(ajġ Dia Pi Gh Nip 

T+ MoN: (a) 


J CEM: [a*]d cy — LEM alle 
T F [unroll M]:¢A [a][a*]¢ = T H {yield M} :(at)¢ 
. CEM:¢ jh CEM:¢6A [alla*]¢ 
T F (stop M): (a) "PE [roll M]:[a*]¢é 
a FEM: ov (a)(a*)o yy, LEM Hed o 
` TE (go M):(a*)¢d TE (e M) fa; Abe 


Fig. 2. CGL proof calculus: Some non-propositional rules 


are computationally trivial. The repetition rules ([*]E,[*]R) fold and unfold the 
notion of repetition as iteration. The rolling and unrolling terms are named in 
analogy to the iso-recursive treatment of recursive types [50], where an explicit 
operation is used to expand and collapse the recursive definition of a type. 
Rules («)C,(*)S,(*)G are the destructor and injectors for (a*)¢, which are 
similar to those for (a U 8). The duality rules ((“)I) say the dual game is proved 
by proving the game where roles are reversed. The sequencing rules ((;) I) say a 
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sequential game is played by playing the first game with the goal of reaching a 
state where the second game is winnable. 

Among os rules, monotonicity M is especially computationally rich. The 
notation [37 ~ BVE j Says that in the second premiss, the assumptions in I” have 


all bound variables of a (written BV (a)) renamed to fresh variables y for com- 
pleteness. In practice, I’ usually contains some assumptions on variables that 
are not bound, which we wish to access without writing them explicitly in ¢. 
Rule M is used to execute programs right-to-left, giving shorter, more efficient 
proofs. It can also be used to derive the Hoare-logical sequential composition 
rule, which is frequently used to reduce the number of case splits. Note that like 
every GL, CGL is subnormal, so the modal modus ponens axiom K and Gödel 
generalization (or necessitation) rule G are not sound, and M takes over much of 
the role they usually serve. On the surface, M simply says games are monotonic: 
a game’s goal proposition may freely be replaced with a weaker one. From a 
computational perspective, Section 7 will show that rule M can be (lazily) elimi- 
nated. Moreover, M is an admissible rule, one whose instances can all be derived 
from existing rules. When proofs are written right-to-left with M, the normal- 
ization relation translates them to left-to-right normal proofs. Note also that in 
checking Mo,N, the context I” has the bound variables a renamed freshly to 
some y within N, as required to maintain soundness across execution of a. 
Next, we consider first-order rules, i.e., those which deal with first-order 
programs that modify program variables. The first-order rules are given in Fig. 3. 
In (:*)E, FV(w) are the free variables of q, the variables which can influence 
its meaning. Nondeterministic assignment provides quantification over rational- 


o pi(c=f)-M:¢ 

C= FE = E M) {x := fho (y ieee) 

r? p:(t@=ft)- M:¢ 

x)] a GE sp a a6 (y, p fresh, f comp.) 
DEM: (a:=*)¢ DY p:ġH Nig 

ve ae N N): neh Ev) 
TF M:¢ 

Te (Ax: Q. M): [x := x*]ġ 

“"" TH(M f): : 


(y fresh) 


Fig. 3. CGL proof calculus: first-order games 


valued program variables. Rule [:*]I is universal, with proof term (Av: Q. M). 
While this notation is suggestive, the difference vs. the function proof term 
(Ap: 6. M) is essential: the proof term M is checked (resp. evaluated) in a state 
where the program variable x has changed from its initial value. For soundness, 
[:*]I renames x to fresh program variable y throughout context I’, written I'4. 
This means that M can freely refer to all facts of the full context, but they 
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now refer to the state as it was before x received a new value. Elimination [:*]E 
then allows instantiating x to a term f. Existential quantification is introduced 
by (:*)I whose proof term (f# :* p. M} is like a dependent pair plus bound 
renaming of x to y. The witness f is an arbitrary computable term, as always. 
We write (f= :* M} for short when y is not referenced in M. It is eliminated in 
(:x)E by unpacking the pair, with side condition x ¢ FV (4) for soundness. The 
assignment rules (:=)I do not quantify, per se, but always update x to the value 
of the term f, and in doing so introduce an assumption that x and f (suitably 
renamed) are now equal. In (:*)I and (:=)I, program variable y is fresh. 


TFA: 
(PEO Mo=M>OrFB:(al(pAMo>M) p:¢,q:M=O0FC:¢ 


(Mo fresh) 


Tt for(p : p(M) = A;q. B; C){a}: (a*) 
CEM:J TE Az(a*)o 
op PITEN: la] p: JEO:¢@ oe si: dE Biv g: laj H C: 
© PE (M rep p: J. N in O):[a*]¢ T FP(A,s. B,g. C): 
split P H (split [f,g] ):f<9Vf>9 


Fig. 4. CGL proof calculus: loops 


The looping rules in Fig. 4, especially («)I, are arguably the most sophis- 
ticated in CGL. Rule (*)I provides a strategy to repeat a game a until the 
postcondition ¢ holds. This is done by exhibiting a convergence predicate y and 
termination metric M with terminal value 0 and well-ordering >. Proof term A 
shows y holds initially. Proof term B guarantees M decreases with every itera- 
tion where Mo is a fresh metric variable which is equal to M at the antecedent 
of B and is never modified. Proof term C allows any postcondition ¢ which fol- 
lows from convergence y A M = 0. Proof term for(p : p(M) = A;q. B;C){a} 
suggests the computational interpretation as a for-loop: proof A shows the con- 
vergence predicate holds in the initial state, B shows that each step reduces the 
termination metric while maintaining the predicate, and C shows that the post- 
condition follows from the convergence predicate upon termination. The game a 
repeats until convergence is reached (M = 0). By the assumption that metrics 
are well-founded, convergence is guaranteed in finitely (but arbitrarily) many 
iterations. 

A naive, albeit correct, reading of rule (x)I says M is literally some term f. 
If lexicographic or otherwise non-scalar metrics should be needed, it suffices to 
interpret y and Moy > M as formulas over several scalar variables. 

Rule FP says (a*)¢ is a least pre-fixed-point. That is, if we wish to show a 
formula w holds now, we show that w is any pre-fixed-point, then it must hold 
as it is no lesser than ¢. Rule [x]I is the well-understood induction rule for loops, 
which applies as well to repeated games. Premiss O ensures [*|I supports any 
provable postcondition, which is crucial for eliminating M in Lemma 7. The elim- 
ination form for [a*]¢@ is simply [*]E. Like any program logic, reasoning in CGL 
consists of first applying program-logic rules to decompose a program until the 
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program has been entirely eliminated, then applying first-order logic principles 
at the leaves of the proof. The constructive theory of rationals is undecidable 
because it can express the undecidable [47] classical theory of rationals. Thus 
facts about rationals require proof in practice. For the sake of space and since our 
focus is on program reasoning, we defer an axiomatization of rational arithmetic 
to future work. We provide a (non-effective!) rule FO which says valid first-order 
formulas are provable. 


rO TEM:p 
~ TE FO¢|(M):¢ 


(exists a s.t. {a} x G6 C [p > 4], p,¢ F.O.) 


An effective special case of FO is split (Fig. 4), which says all term comparisons 
are decidable. Rule split can be generalized to decide termination metrics (M = 
0v M > 0). Rule iG says the value of term f can be remembered in fresh ghost 
variable x: 


Cs Iip:c=fltM:¢ 
= TF Ghost[z = f](p. M): ọ 


(x fresh except free in M, p fresh) 


Rule iG can be defined using arithmetic and with quantifiers: 
Ghost[z = f](p. M) = (Av: Q. (Ap: (x = f). M)) f (FO[f = £10) 


What’s Novel in the CGL Calculus? CGL extends first-order reasoning with game 
reasoning (sequencing [32], assignments, iteration, and duality). The combina- 
tion of first-order reasoning with game reasoning is synergistic: for example, 
repetition games are known to be more expressive than repetition systems [42]. 
We give a new natural-deduction formulation of monotonicity. Monotonicity is 
admissible and normalization translates monotonicity proofs into monotonicity- 
free proofs. In doing so, normalization shows that right-to-left proofs can be 
(lazily) rewritten as left-to-right. Additionally, first-order games are rife with 
changing state, and soundness requires careful management of the context I. 
The extended version [12] uses our calculus to prove the example formulas. 


6 Theory: Soundness 


Full versions of proofs outlined in this paper are given in the extended ver- 
sion [12]. We have introduced a proof calculus for CGL which can prove winning 
strategies for NIM and CC. For any new proof calculus, it is essential to con- 
vince ourselves of our soundness, which can be done within several prominent 
schools of thought. In proof-theoretic semantics, for example, the proof rules are 
taken as the ground truth, but are validated by showing the rules obey expected 
properties such as harmony or, for a sequent calculus, cut-elimination. While we 
will investigate proof terms separately (Section 8), we are already equipped to 
show soundness by direct appeal to the realizability semantics (Section 4), which 
we take as an independent notion of ground truth. We show soundness of CGL 
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proof rules against the realizability semantics, i.e., that every provable natural- 
deduction sequent is valid. An advantage of this approach is that it explicitly 
connects the notions of provability and computability! We build up to the proof 
of soundness by proving lemmas on structurality, renaming and substitution. 


Lemma 1 (Structurality). The structural rules W, X, and C are admissible, 

i.e., the conclusions are provable whenever the premisses are provable. 
TEM: c Dp:ġq: pt M:p c LP Gg: et M:p 

Iyp:w-+}M:¢@  I,q:¥,p:¢oF M:p I,p: ot [p/q] M:p 


Proof summary. Each rule is proved admissible by induction on M. Observe that 
the only premisses regarding I" are of the form I'(p) = ¢, which are preserved 
under weakening. Premisses are trivially preserved under exchange because con- 
texts are treated as sets, and preserved modulo renaming by contraction as it 
suffices to have any assumption of a given formula, regardless its name. The 
context I" is allowed to vary in applications of the inductive hypothesis, e.g., in 
rules that bind program variables. Some rules discard I" in checking the subterms 
inductively, in which case the IH need not be applied at all. 


W 
VV 


Lemma 2 (Uniform renaming). Let M4 be the renaming of program variable 
x to y (and vice-versa) within M, even when neither x nor y is fresh. If r} M 
:ġ then PEt ME: 4. 


Proof summary. Straightforward induction on the structure of M. Renaming 
within proof terms (whose definition we omit as it is quite tedious) follows 
the usual homomorphisms, from which the inductive cases follow. In the case 
that M is a proof variable z, then (4) (z) = I'(z)4% from which the case 
follows. The interesting cases are those which modify program variables, e.g., 
(z:= f% in p. M}. The bound variable z is renamed to z#, while the auxiliary 
variable w is a-varied if necessary to maintain freshness. Renaming then happens 
recursively in M. 


Substitution will use proofs of coincidence and bound effect lemmas. 


Lemma 3 (Coincidence). Only the free variables of an expression influence 
its semantics. 


Lemma 4 (Bound effect). Only the bound variables of a game are modified 
by execution. 


Summary. By induction on the expression, in analogy to [43]. 


Definition 9 (Term substitution admissibility). For simplicity, we say of 
(likewise for context T, term f, game a, and proof term M) is admissible if ¢ 
binds neither x nor free variables of f. 


The latter condition can be relaxed in practice [44] to requiring ¢ does not 
mention x under bindings of free variables. 
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Lemma 5 (Arithmetic-term substitution). If [+ M:¢ and the substitu- 
tions T$, Mf and o£ are admissible, then Df + Mf : of. 


? 


Summary. By induction on M. Admissibility holds recursively, and so can be 
assumed at each step of the induction. For non-atomic M that bind no variables, 
the proof follows from the inductive hypotheses. For M that bind variables, we 
appeal to Lemma 3 and Lemma 4. 


Just as arithmetic terms are substituted for program variables, proof terms 
are substituted for proof variables. 


Lemma 6 (Proof term substitution). Let [N/p|M substitute N for p in M, 
avoiding capture. IFT, p: HM: and rF N:4% then rF [N/p]M :¢ġ. 


Proof. By induction on M, appealing to renaming, coincidence, and bound ef- 
fect. When substituting N for p into a term that binds program variables such 
as (z := f# in q. M), we avoid capture by renaming within occurrences of N 
in the recursive call, i.e., [N/p] (2 := f% in q. M) = (z := f2 ing. [NZ/p|M), 
preserving soundness by Lemma 2. 


Soundness of the proof calculus exploits renaming and substitution. 


Theorem 1 (Soundness of proof calculus). If r į M:¢ then (I F @) is 
valid. As a special case for empty context -, if -į M :¢, then ¢ is valid. 


Proof summary. By induction on M. Modus ponens case A B reduces to Lemma 6. 
Cases that bind program variables, such as assignment, hold by Lemma 5 and 
Lemma 2. Rule W is employed when substituting under a binder. 


We have now shown that the CGL proof calculus is sound, the sine qua non 
condition of any proof system. Because soundness was w.r.t. a realizability se- 
mantics, we have shown CGL is constructive in the sense that provable formulas 
correspond to realizable strategies, i.e., imperative programs executed in an ad- 
versarial environment. We will revisit constructivity again in Section 8 from the 
perspective of proof terms as functional programs. 


7 Operational Semantics 


The Curry-Howard interpretation of games is not complete without exploring the 
interpretation of proof simplification as normalization of functional programs. 
To this end, we now introduce a structural operational semantics for CGL proof 
terms. This semantics provides a view complementary to the realizability seman- 
tics: not only do provable formulas correspond to realizers, but proof terms can 
be directly executed as functional programs, resulting in a normal proof term. 
The chief subtlety of our operational semantics is that in contrast to realizer exe- 
cution, proof simplification is a static operation, and thus does not inspect game 
state. Thus the normal form of a proof which branches on the game state is, of 
necessity, also a proof which branches on the game state. This static-dynamic 
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phase separation need not be mysterious: it is analogous to the monadic phase 
separation between a functional program which returns an imperative command 
vs. the execution of the returned command. While the primary motivation for 
our operational semantics is to complete the Curry-Howard interpretation, proof 
normalization is also helpful when implementing software tools which process 
proof artifacts, since code that consumes a normal proof is in general easier to 
implement than code that consumes an arbitrary proof. 

The operational semantics consist of two main judgments: M normal says that 
M isa normal form, while M ++ M’ says that M reduces to term M” in one step 
of evaluation. A normal proof is allowed a case operation at the top-level, either 
(case A of L= B|r= C) or (case, A of s > B | g = C}. Normal proofs M 
without state-casing are called simple, written M simp. The requirement that 
cases are top-level ensures that proofs which differ only in where the case was 
applied share a common normal form, and ensures that -reduction is never 
blocked by a case interceding between introduction-elimination pairs. Top-level 
case analyses are analogous to case-tree normal forms in lambda calculi with 
coproducts |4]. Reduction of proof terms is eager. 


Definition 10 (Normal forms). We say M is simple, written M simp, if 
eliminators occur only under binders. We say M is normal, written M normal, if 
M simp or M has shape (case A of L= B |r = C) or (case, A of s > B|g=> 
C) where A is a term such as (split |f, g] M) that inspects the state. Subterms 
B and C need not be normal since they occur under the binding of L or r (resp. 


s or g). 


That is, a normal term has no top-level beta-redexes, and state-dependent 
cases are top-level. We consider rules [*]R, {:*]I, [?]I, and (:=)I binding. Rules 
such as (*)I have multiple premisses but bind only one. While [*]R does not 
introduce a proof variable, it is rather considered binding to prevent divergence, 
which is in keeping with a coinductive understanding of formula [a*]ġ. If we did 
not care whether terms diverge, we could have made [*]R non-binding. 

For the sake of space, this section focuses on the 6-rules (Fig. 5). The full 
calculus, given in the extended version [12], includes structural and commuting- 
conversion rules, as well as what we call monotonicity conversion rules: a proof 
term Mo,N is simplified by structural recursion on M. The capture-avoiding 
substitution of M for p in N is written [M/p|N (Lemma6). The propositional 
cases A@B, AGB, caseBL, caseGR, 78, and m28 are standard reductions for ap- 
plications, cases, and projections. Projection terms mı M and 72M should not 
be confused with projection realizers 7z (a) and mpr(a). Rule unpack makes the 
witness of an existential available in its client as a ghost variable. 

Rule FP6, rep, and forf reduce introductions and eliminations of loops. 
Rule FPG, which reduces a proof FP(A,s. B,g. C) says that if a* has already 
terminated according to A, then B proves the postcondition. Else the inductive 
step C applies, but every reference to the IH g is transformed to a recursive 
application of FP. If A uses only (*)S and (*)G, then FP(A, s. B,g. C) reduces 
to a simple term, else if A uses (*)I, then FP(A, s. B,g. C) reduces to a case. 
Rule rep{ says loop induction (M rep p : J. N in O) reduces to a delayed pair 
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3 (Ap: ¢. M) N > [N/p|M caseSL (case (€- A) of > Bl r= C) |> [A/B 
AB (At: Q. M) fo Mf caseßR (case (r- A) of L> B |r => C)H [A/r]C 
718 (mı (M, N)) => M unroll8 [unroll [roll M]] => M 


s (ma (M, ND) >N 
unpack) unpack((f# :» q. M}, py. N) + (Ghost[a = f#](q. [M/plN))2 
BEp e ig E eae Dota Bo lentils Bie Cia) 
ee rep p: J. N in O)++ [roll (M, ([M/p]N)oq(q rep p: J. N in O))] 
(case tofoe PT = A; q. B; C){a} > 
L= (stop [(A, £)/(p, a)|C) 
| r = Ghost[Mo = M](rr. (go (([A, (rr,r)/p, q]B)o:(for(p : p(M) = mt; q. B; C){a}))))) 


Fig. 5. Operational semantics: 6-rules 


of the “stop” and “go” cases, where the “go” case first shows [a]J, for loop in- 
variant J, then expands J — [a*]¢ in the postcondition. Note the laziness of 
[roll] is essential for normalization: when (M rep p : J. N in O) is understood 
as a coinductive proof, it is clear that normalization would diverge if rep were 
applied indefinitely. Rule forg for for(p : p(M) = A;q. B;C){a} checks whether 
the termination metric M has reached terminal value 0. If so, the loop (stop)’s 
and A proves it has converged. Else, we remember M’s value in a ghost term 
Mo, and (go) forward, supplying A and (r,rr) to satisfy the preconditions of 
inductive step B, then execute the loop for(p : y(M) = mt;q.B;C){a} in the 
postcondition. Rule forg reflects the fact that the exact number of iterations is 
state dependent. 

We discuss the structural, commuting conversion, and monotonicity conver- 
sion rules for left injections as an example, with the full calculus in [12]. Struc- 
tural rule £S evaluates term M under an injector. Commuting conversion rule 
(é-)C normalizes an injection of a case to a case with injectors on each branch. 
Monotonicity conversion rule (/é-}}o simplifies a monotonicity proof of an injection 
to an injection of a monotonicity proof. 


Mw M' 
> E-M) > © MD 
L)C W- (case A of p Blq C)) + (case A of p> (€-B)|q> (é-C)) 
-Jo W- Mho N => (é-(Mo,N)) 


Fig. 6. Operational semantics: structural, commuting conversion, monotonicity rules 


8 Theory: Constructivity 


We now complete the study of CGL’s constructivity. We validate the operational 
semantics on proof terms by proving that progress and preservation hold, and 
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thus the CGL proof calculus is sound as a type system for the functional pro- 
gramming language of CGL proof terms. 


Lemma 7 (Progress). If-+ M:¢, then either M is normal or M œ> M' for 
some M’. 


Summary. By induction on the proof term M. If M is an introduction rule, 
by the inductive hypotheses the subterms are well-typed. If they are all simple, 
then M simp. If some subterm (not under a binder) steps, then M steps by a 
structural rule. Else some subterm is an irreducible case expression not under 
a binder, it lifts by the commuting conversion rule. If M is an elimination rule, 
structural and commuting conversion rules are applied as above. Else by Def. 10 
the subterm is an introduction rule, and M reduces with a 8-rule. Lastly, if M 
has form Ao,B and A simp, then by Def.10 A is an introduction form, thus 
reduced by some monotonicity conversion rule. 


Lemma 8 (Preservation). Let +> * be the reflexive, transitive closure of the 
+» relation. If- M:¢ and M *M', then -F M':¢ 


Summary. Induct on the derivation M+>*M’, then induct on M => M’. The 8 
cases follow by Lemma 6 (for base constructs), and Lemma 6 and Lemma 2 (for 


assignments). C-rules and o-rules lift across binders, soundly by W. S-rules are 
direct by IH. 


We gave two understandings of proofs in CGL, as imperative strategies and 
as functional programs. We now give a final perspective: CGL proofs support 
synthesis in principle, one of our main motivations. Formally, the Existential 
Property (EP) and Disjunction Property (DP) justify synthesis [18] for exis- 
tentials and disjunctions: whenever an existential or disjunction has a proof, 
then we can compute some instance or disjunct that has a proof. We state and 
prove an EP and DP for CGL, then introduce a Strategy Property, their coun- 
terpart for synthesizing strategies from game modalities. It is important to our 
EP that terms are arbitrary computable functions, because more simplistic term 
languages are often too weak to witness the existentials they induce. 


Example 1 (Rich terms help). Formulas over polynomial terms can have non- 
polynomial witnesses. 


Let 6 = (x =yAx>0)V (x = —y ^z < 0). Then f = |x| witnesses dy: Q ¢. 


Lemma 9 (Existential Property). If r + M:(Ar:Q ¢) then there exists a 
term f and realizer b such that for all (a,w) € [A I], we have (ba, wi) c [¢]. 


Proof. By Theorem 1, the sequent (I dx:Q ¢) is valid. Since (a,w) € [A I], 
then by the definition of sequent validity, there exists a common realizer c such 
that (ca,w) € [Bx:Q ¢]. Now let f = 7z(ca) and b = 7r(ca) and the result is 
immediate by the semantics of existentials. 
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Disjunction strategies can depend on the state, so naïve DP does not hold. 


Example 2 (Naive DP). When I A M:(¢ Vw) there need not be N such that 
TFEN:dorlFN:y. 


Consider 6 = x > 0 and Ww = x < 1. Then - F split [2,0] ():(@ V Y), but 
neither x < 1 nor x > 0 is valid, let alone provable. 


Lemma 10 (Disjunction Property). When I + M : ovy there exists realizer 
b and computable f, s.t. for every w and a such that (a,w) € [AI], either 
f(w) =0 and (mz (b),w) € [¢], else f(w) = 1 and (rp(b),w) € Iy]. 


Proof. By Theorem1, the sequent I + $V w is valid. Since (a,w) € [AT], 
then by the definition of sequent validity, there exists a common realizer c such 
that (ca,w) € [Jọ V 7]. Now let f = mz(ca) and b = mrR(ca) and the result is 
immediate by the semantics of disjunction. 


Following the same approach, we generalize to a Strategy Property. In CGL, 
strategies are represented by realizers, which implement every computation made 
throughout the game. Thus, to show provable games have computable winning 
strategies, it suffices to exhibit realizers. 


Theorem 2 (Active Strategy Property). Jf [+ M:(a)@, then there exists 
a realizer b such that for all w and realizers a such that (a,w) € [AT], then 


{(ba,w)} (a) S [oJ U{T}. 


Theorem 3 (Dormant Strategy Property). If "+ M: [a], then there ez- 
ists a realizer b such that for all w and realizers a such that (a,w) € [A I], then 


{(ba,w)}[lal] E fe] U {T}. 


Summary. From proof term M and Theorem 1, we have a realizer for formula 
(a) ¢ or [a]¢, respectively. We proceed by induction on a: the realizer ba contains 
all realizers applied in the inductive cases composed with their continuations that 
prove @ in each base case. 


While these proofs, especially EP and DP, are short and direct, we note that 
this is by design: the challenge in developing CGL is not so much the proofs of 
this section, rather these proofs become simple because we adopted a realizability 
semantics. The challenge was in developing the semantics and adapting the proof 
calculus and theory to that semantics. 


9 Conclusion and Future Work 


In this paper, we developed a Constructive Game Logic CGL, from syntax and re- 
alizability semantics to a proof calculus and operational semantics on the proof 
terms. We developed two understandings of proofs as programs: semantically, 
every proof of game winnability corresponds to a realizer which computes the 
game’s winning strategy, while the language of proof terms is also a functional 
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programming language where proofs reduce to their normal forms according to 
the operational semantics. We completed the Curry-Howard interpretation for 
games by showing Existential, Disjunction, and Strategy properties: programs 
can be synthesized that decide which instance, disjunct, or moves are taken in 
existentials, disjunctions, and games. In summary, we have developed the most 
comprehensive Curry-Howard interpretation of any program logic to date, for a 
much more expressive logic than prior work [32]. Because CGL contains construc- 
tive Concurrent DL and first-order DL as strict fragments, we have provided a 
comprehensive Curry-Howard interpretation for them in one fell swoop. The key 
insights behind CGL should apply to the many dynamic and Hoare logics used 
in verification today. 

Synthesis is the immediate application of CGL. Motivations for synthesis 
include security games [40], concurrent programs with demonic schedulers (Con- 
current Dynamic Logic), and control software for safety-critical cyber-physical 
systems such as cars and planes. In general, any kind of software program which 
must operate correctly in an adversarial environment can benefit from game logic 
verification. The proofs of Theorem 2 and Theorem 3 constitute an (on-paper) 
algorithm which performs synthesis of guaranteed-correct strategies from game 
proofs. The first future work is to implement this algorithm in code, providing 
much-needed assurance for software which is often mission-critical or safety- 
critical. This paper focused on discrete CGL with one numeric type simply be- 
cause any further features would distract from the core features. Real applica- 
tions come from many domains which add features around this shared core. 

The second future work is to extend CGL to hybrid games, which provide 
compelling applications from the domain of adversarial cyber-physical systems. 
This future work will combine the novel features of CGL with those of the classical 
logic dGL. The primary task is to define a constructive semantics for differential 
equations and to give constructive interpretations to the differential equation 
rules of dGL. Previous work on formalizations of differential equations [34] sug- 
gests differential equations can be treated constructively. In principle, existing 
proofs in dGL might happen to be constructive, but this does not obviate the 
present work. On the contrary, once a game logic proof is shown to fall in the 
constructive fragment, our work gives a correct synthesis guarantee for it too! 
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Abstract. Interprocedural data-flow analyses form an expressive and 
useful paradigm of numerous static analysis applications, such as live 
variables analysis, alias analysis and null pointers analysis. The most 
widely-used framework for interprocedural data-flow analysis is [F'DS, 
which encompasses distributive data-flow functions over a finite domain. 
On-demand data-flow analyses restrict the focus of the analysis on spe- 
cific program locations and data facts. This setting provides a natural 
split between (i) an offline (or preprocessing) phase, where the program 
is partially analyzed and analysis summaries are created, and (ii) an on- 
line (or query) phase, where analysis queries arrive on demand and the 
summaries are used to speed up answering queries. 

In this work, we consider on-demand IFDS analyses where the queries 
concern program locations of the same procedure (aka same-context 
queries). We exploit the fact that flow graphs of programs have low 
treewidth to develop faster algorithms that are space and time optimal 
for many common data-flow analyses, in both the preprocessing and the 
query phase. We also use treewidth to develop query solutions that are 
embarrassingly parallelizable, i.e. the total work for answering each query 
is split to a number of threads such that each thread performs only a 
constant amount of work. Finally, we implement a static analyzer based 
on our algorithms, and perform a series of on-demand analysis experi- 
ments on standard benchmarks. Our experimental results show a dras- 
tic speed-up of the queries after only a lightweight preprocessing phase, 
which significantly outperforms existing techniques. 
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1 Introduction 


Static data-flow analysis. Static program analysis is a fundamental approach 
for both analyzing program correctness and performing compiler optimizations 
[25]39]44]64]30). Static data-flow analyses associate with each program location 
a set of data-flow facts which are guaranteed to hold under all program ex- 
ecutions, and these facts are then used to reason about program correctness, 
report erroneous behavior, and optimize program execution. Static data-flow 
analyses have numerous applications, such as in pointer analysis (e.g., points- 
to analysis and detection of null pointer dereferencing) 6576116266167169], in 
detecting privacy and security issues (e.g., taint analysis, SQL injection analysis) 
[3]37]31]33147]40], as well as in compiler optimizations (e.g., constant propaga- 
tion, reaching definitions, register allocation) [50)32/55[13[2]. 


Interprocedural analysis and the IFDS framework. Data-flow analyses fall in two 
large classes: intraprocedural and interprocedural. In the former, each procedure 
of the program is analyzed in isolation, ignoring the interaction between proce- 
dures which occurs due to parameter passing/return. In the latter, all procedures 
of the program are analyzed together, accounting for such interactions, which 
leads to results of increased precision, and hence is often preferable to intrapro- 
cedural analysis [49]54]59]60]. To filter out false results, interprocedural analyses 
typically employ call-context sensitivity, which ensures that the underlying exe- 
cution paths respect the calling context of procedure invocations. One of the most 
widely used frameworks for interprocedural data-flow analysis is the framework 
of Interprocedural Finite Distributive Subset (IFDS) problems [50], which offers 
a unified formulation of a wide class of interprocedural data-flow analyses as a 
reachability problem. This elegant algorithmic formulation of data-flow analysis 
has been a topic of active study, allowing various subsequent practical improve- 
ments [36)45]/8]3/47[56] and implementations in prominent static analysis tools 
such as Soot [7] and WALA [I]. 


On-demand analysis. Exhaustive data-flow analysis is computationally expensive 
and often unnecessary. Hence, a topic of great interest in the community is 
that of on-demand data-flow analysis [4]27[36/51/48]68]45]. On-demand analyses 
have several applications, such as (quoting from [36]48]) (i) narrowing down the 
focus to specific points of interest, (ii) narrowing down the focus to specific 
data-flow facts of interest, (iii) reducing work in preliminary phases, (iv) side- 
stepping incremental updating problems, and (v) offering demand analysis as a 
user-level operation. On-demand analysis is also extremely useful for speculative 
optimizations in just-in-time compilers [24]43]5[29], where dynamic information 
can dramatically increase the precision of the analysis. In this setting, it is crucial 
that the the on-demand analysis runs fast, to incur as little overhead as possible. 


Example 1. As a toy motivating example, consider the partial program shown in 
Figure [I] compiled with a just-in-time compiler that uses speculative optimiza- 
tions. Whether the compiler must compile the expensive function h depends on 
whether x is null in line 6. Performing a null-pointer analysis from the entry of 
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1 void f(int b){ 9 void g(int *&x, int *y){ 
2 int *x = NULL, *y = NULL; 10 X=y; 

3 if(b > 1) 11 } 

4 y = &b; 

5 g(x,y); 12 void h(){ 

6 if (x==NULL) 13 //An expensive 

7 h(); 14 //function 

8 } 15 } 


Fig. 1: A partial C++ program. 


f reveals that x might be null in line 6. Hence, if the decision to compile h relies 
only on an offline static analysis, h is always compiled, even when not needed. 

Now consider the case where the execution of the program is in line 4, and 
at this point the compiler decides on whether to compile h. It is clear that 
given this information, x cannot be null in line 6 and thus h does not have 
to be compiled. As we have seen above, this decision can not be made based 
on offline analysis. On the other hand, an on-demand analysis starting from the 
current program location will correctly conclude that x is not null in line 6. Note 
however, that this decision is made by the compiler during runtime. Hence, such 
an on-demand analysis is useful only if it can be performed extremely fast. It 
is also highly desirable that the time for running this analysis is predictable, so 
that the compiler can decide whether to run the analysis or simply compile h 
proactively. 


The techniques we develop in this paper answer the above challenges rigor- 
ously. Our approach exploits a key structural property of flow graphs of pro- 
grams, called treewidth. 


Treewidth of programs. A very well-studied notion in graph theory is the con- 
cept of treewidth of a graph, which is a measure of how similar a graph is to 
a tree (a graph has treewidth 1 precisely if it is a tree) [52]. On one hand the 
treewidth property provides a mathematically elegant way to study graphs, and 
on the other hand there are many classes of graphs which arise in practice and 
have constant treewidth. The most important example is that the flow graph 
for goto-free programs in many classic programming languages have constant 
treewidth [63]. The low treewidth of flow graphs has also been confirmed exper- 
imentally for programs written in Java [84], C [38], Ada and Solidity [15]. 

Treewidth has important algorithmic implications, as many graph problems 
that are hard to solve in general admit efficient solutions on graphs of low 
treewidth. In the context of program analysis, this property has been exploited to 
develop improvements for register allocation [63]9] (a technique implemented in 
the Small Device C Compiler [28]), cache management [18], on-demand algebraic 
path analysis [I6], on-demand intraprocedural data-flow analysis of concurrent 
programs and data-dependence analysis [14]. 
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Problem statement. We focus on on-demand data-flow analysis in IFDS |50[36/48}. 
The input consists of a supergraph G of n vertices, a data-fact domain D and a 
data-flow transformer function M. Edges of G capture control-flow within each 
procedure, as well as procedure invocations and returns. The set D defines the 
domain of the analysis, and contains the data facts to be discovered by the anal- 
ysis for each program location. The function M associates with every edge (u, v) 
of G a data-flow transformer M (u,v) : 2? — 2P. In words, M (u,v) defines the 
set of data facts that hold at v in some execution that transitions from u to v, 
given the set of data facts that hold at u. 

On-demand analysis brings a natural separation between (i) an offline (or 
preprocessing) phase, where the program is partially analyzed, and (ii) an online 
(or query) phase, where on-demand queries are handled. The task is to preprocess 
the input in the offline phase, so that in the online phase, the following types of 
on-demand queries are answered efficiently: 

1. A pair query has the form (u,d1,v,d2), where u,v are vertices of G in the 
same procedure, and d1, də are data facts. The goal is to decide if there exists 
an execution that starts in u and ends in v, and given that the data fact dı 
held at the beginning of the execution, the data fact də holds at the end. 
These are known as same-conteat queries and are very common in data-flow 
analysis [2350116]. 

2. A single-source query has the form (u, di), where u is a vertex of G and dı 
is a data fact. The goal is to compute for every vertex v that belongs to the 
same procedure as u, all the data facts that might hold in v as witnessed by 
executions that start in u and assuming that dı holds at the beginning of 
each such execution. 


Previous results. The on-demand analysis problem admits a number of solutions 
that lie in the preprocessing/query spectrum. On the one end, the preprocessing 
phase can be disregarded, and every on-demand query be treated anew. Since 
each query starts a separate instance of IFDS, the time to answer it is O(n-|D]?), 
for both pair and single-source queries [50]. On the other end, all possible queries 
can be pre-computed and cached in the preprocessing phase in time O(n? -| D|’), 
after which each query costs time proportional to the size of the output (i.e., 
O(1)) for pair queries and O(n- |D|) for single-source queries). Note that this full 
preprocessing also incurs a cost O(n? -|D]?) in space for storing the cache table, 
which is often prohibitive. On-demand analysis was more thoroughly studied 
in [36]. The main idea is that, instead of pre-computing the answer to all possible 
queries, the analysis results obtained by handling each query are memoized to a 
cache table, and are used for speeding up the computation of subsequent queries. 
This is a heuristic-based approach that often works well in practice, however, 
the only guarantee provided is that of same-worst-case-complexity, which states 
that in the worst case, the algorithm uses O(n? - |D|°) time and O(n? - |D|?) 
space, similarly to the complete preprocessing case. This guarantee is inadequate 
for runtime applications such as the example of Figure |1| as it would require 
either (i) to run a full analysis, or (ii) to run a partial analysis which might 
wrongly conclude that h is reachable, and thus compile it. Both cases incur a 
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large runtime overhead, either because we run a full analysis, or because we 
compile an expensive function. 


Our contributions. We develop algorithms for on-demand IFDS analyses that 
have strong worst-case time complexity guarantees and thus lead to more pre- 
dictable performance than mere heuristics. The contributions of this work are 
as follows: 


1. We develop an algorithm that, given a program represented as a supergraph 
of size n and a data fact domain D, solves the on-demand same-context IFDS 
problem while spending (i) O(n -|D|*) time in the preprocessing phase, and 
(ii) O([|D|/logn]) time for a pair query and O(n -|D|?/logn) time for a 
single-source query in the query phase. Observe that when |D| = O(1), the 
preprocessing and query times are proportional to the size of the input and 
outputs, respectively, and are thus optimal In addition, our algorithm uses 
O(n- |D|?) space at all times, which is proportional to the size of the input, 
and is thus space optimal. Hence, our algorithm not only improves upon 
previous state-of-the-art solutions, but also ensures optimality in both time 
and space. 

2. We also show that after our one-time preprocessing, each query is embar- 
rassingly parallelizable, i.e., every bit of the output can be produced by a 
single thread in O(1) time. This makes our techniques particularly useful to 
speculative optimizations, since the analysis is guaranteed to take constant 
time and thus incur little runtime overhead. Although the parallelization of 
data-flow analysis has been considered before [41]42]53], this is the first time 
to obtain solutions that span beyond heuristics and offer theoretical guaran- 
tees. Moreover, this is a rather surprising result, given that general IFDS is 
known to be P-complete. 

3. We implement our algorithms on a static analyzer and experimentally eval- 
uate their performance on various static analysis clients over a standard set 
of benchmarks. Our experimental results show that after only a lightweight 
preprocessing, we obtain a significant speedup in the query phase compared 
to standard on-demand techniques in the literature. Also, our parallel im- 
plementation achieves a speedup close to the theoretical optimal, which il- 
lustrates that the perfect parallelization of the problem is realized by our 
approach in practice. 

Recently, we exploited the low-treewidth property of programs to obtain 
faster algorithms for algebraic path analysis and intraprocedural reachabil- 
ity [21]. Data-flow analysis can be reduced to these problems. Hence, the algo- 
rithms in [1621] can also be applied to our setting. However, our new approach 
has two important advantages: (i) we show how to answer queries in a perfectly 
parallel manner, and (ii) reducing the problem to algebraic path properties and 
then applying the algorithms in yields O(n- | D|?) preprocessing time and 
O(n-logn-|D|?) space, and has pair and single-source query time O(|D]|) and 
O(n- |D|?). Hence, our space usage and query times are better by a factor of 


SNote that we count the input itself as part of the space usage. 
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log d1] Moreover, when considering the complexity wrt n, i.e. considering D to 
be a constant, these results are optimal wrt both time and space. Hence, no 
further improvement is possible. 


Remark. Note that our approach does not apply to arbitrary CFL reachability 
in constant treewidth. In addition to the treewidth, our algorithms also exploit 
specific structural properties of IFDS. In general, small treewidth alone does not 
improve the complexity of CFL reachability [14]. 


2 Preliminaries 


Model of computation. We consider the standard RAM model with word size 
W = O(logn), where n is the size of our input. In this model, one can store 
W bits in one word (aka “word tricks”) and arithmetic and bitwise operations 
between pairs of words can be performed in O(1) time. In practice, word size is 
a property of the machine and not the analysis. Modern machines have words 
of size at least 64. Since the size of real-world input instances never exceeds 264, 
the assumption of word size W = O(logn) is well-realized in practice and no 
additional effort is required by the implementer to account for W in the context 
of data flow analysis. 


Graphs. We consider directed graphs G = (V, E) where V is a finite set of 
vertices and Æ C V x V is a set of directed edges. We use the term graph to 
refer to directed graphs and will explicitly mention if a graph is undirected. 
For two vertices u,v € V, a path P from u to v is a finite sequence of vertices 
P = (w;)#_9 such that wo = u, wk = v and for every i < k, there is an edge from 
w; to wi41 in E. The length |P| of the path P is equal to k. In particular, for 
every vertex u, there is a path of length 0 from u to itself. We write P : u ~ v to 
denote that P is a path from u to v and u ~ v to denote the existence of such a 
path, i.e. that v is reachable from u. Given a set V’ C V of vertices, the induced 
subgraph of G on V’ is defined as G[V’] = (V’, EN(V’ x V’)). Finally, the graph 
G is called bipartite if the set V can be partitioned into two sets Vj, V2, so that 
every edge has one end in Vj and the other in Va, ie. Æ C (Vi x V2) U (V2 x V1). 


2.1 The IFDS Framework 


IFDS [50] is a ubiquitous and general framework for interprocedural data-flow 
analyses that have finite domains and distributive flow functions. It encompasses 
a wide variety of analyses, including truly-live variables, copy constant propa- 
gation, possibly-uninitialized variables, secure information-flow, and gen/kill or 
bitvector problems such as reaching definitions, available expressions and live 
variables [50[7]. IFDS obtains interprocedurally precise solutions. In contrast to 
intraprocedural analysis, in which precise denotes “meet-over-all-paths” , inter- 
procedurally precise solutions only consider valid paths, i.e. paths in which when 


This improvement is due to the differences in the preprocessing phase. Our algo- 
rithms for the query phase are almost identical to our previous work. 
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a function reaches its end, control returns back to the site of the most recent 


call [58]. 


Flow graphs and supergraphs. In IFDS, a program with k procedures is specified 
by a supergraph, i.e. a graph G = (V, E) consisting of k flow graphs G1,..., Gk, 
one for each procedure, and extra edges modeling procedure-calls. Flow graphs 
represent procedures in the usual way, i.e. they contain one vertex v; for each 
statement 7 and there is an edge from v; to v; if the statement j may immediately 
follow the statement 7 in an execution of the procedure. The only exception is 
that a procedure-call statement 7 is represented by two vertices, a call vertex 
ci and a return-site vertex ri. The vertex c; only has incoming edges, and the 
vertex r; only has outgoing edges. There is also a call-to-return-site edge from 
ci to ri. The call-to-return-site edges are included for passing intraprocedural 
information, such as information about local variables, from c; to r;. Moreover, 
each flow graph G; has a unique start vertex s; and a unique exit vertex ez. 

The supergraph G also contains the following edges for each procedure-call 7 
with call vertex c; and return-site vertex r; that calls a procedure l: (i) an inter- 
procedural call-to-start edge from c; to the start vertex of the called procedure, 
ie. sı, and (ii) an interprocedural exit-to-return-site edge from the exit vertex 
of the called procedure, i.e. ez, to ri. 


Example 2. Figure[2|shows a simple C++ program on the left and its supergraph 
on the right. Each statement i of the program has a corresponding vertex v; in 
the supergraph, except for statement 7, which is a procedure-call statement and 
hence has a corresponding call vertex cy and return-site vertex 17. 


1 void f(int *&x, int *y){ Smain (v) 
2 y = new int(1); 

3 y = new int (2); © 
4 } 


call-to-return-site 


int main(){ © © 
int *x, *y; Pa 
AY t 
f(x,y); ORG O 
ye 


*x += *y; E 


} ef (v1) Emain © 


Oona vo 


Fig. 2: A C++ program (left) and its supergraph (right). 


Interprocedurally valid paths. Not every path in the supergraph G can potentially 
be realized by an execution of the program. Consider a path P in G and let P’ 
be the sequence of vertices obtained by removing every v; from P, i.e. P’ only 
consists of c;’s and r;’s. Then, P is called a same-contezt valid path if P’ can be 
generated from S in this grammar: 
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S —ci S r; S for a procedure-call statement i 
| e 


Moreover, P is called an interprocedurally valid path or simply valid if P’ can be 
generated from the nonterminal S’ in the following grammar: 


S —S' ci S for a procedure-call statement i 
| § 


For any two vertices u,v of the supergraph G, we denote the set of all interproce- 
durally valid paths from u to v by IVP(u, v) and the set of all same-context valid 
paths from u to v by SCVP(u, v). Informally, a valid path starts from a statement 
in a procedure p of the program and goes through a number of procedure-calls 
while respecting the rule that whenever a procedure ends, control should return 
to the return-site in its parent procedure. A same-context valid path is a valid 
path in which every procedure-call ends and hence control returns back to the 
initial procedure p in the same context. 


IFDS [50]. An IFDS problem instance is a tuple I = (G, D, F, M,N) where: 

— G = (V, E) is a supergraph as above. 

— Disa finite set, called the domain, and each d € D is called a data flow fact. 

— The meet operator M is either intersection or union. 

— F C 2P — 2” is a set of distributive flow functions over M, i.e. for each 
function f € F and every two sets of facts D1, Da C D, we have f(D,ND2) = 
f(D1) N f(D2). 

— M : E > F isa map that assigns a distributive flow function to each edge 
of the supergraph. 

Let P = (w;)*.9 be a path in G, e; = (wj_1,w;) and m; = M(e;). In other 
words, the e;’s are the edges appearing in P and the m,’s are their corresponding 
distributive flow functions. The path function of P is defined as: pfp := mx o 

-o M2 0m, where o denotes function composition. The solution of I is the 
collection of values {MVPy}yev: 


MVP, :=  []  pfp(D). 
PEIVP (Smain; v) 


Intuitively, the solution is defined by taking meet-over-all-valid-paths. If the meet 
operator is union, then MVP, is the set of data flow facts that may hold at v, 
when v is reached in some execution of the program. Conversely, if the meet 
operator is intersection, then MVP, consists of data flow facts that must hold 
at v in every execution of the program that reaches v. Similarly, we define the 
same-context solution of I as the collection of values {MSCP, bve Vnan defined as 
follows: 

MSCP,, := [] pfp(D). (1) 

PESCVP(Smain,v) 

The intuition behind MSCP is similar to that of MVP, except that in MSCP,, 
we consider meet-over-same-contezt-paths (corresponding to runs that return to 
the same stack state). 
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Remark 1. We note two points about the IFDS framework: 

— As in [50], we only consider IFDS instances in which the meet operator 
is union. Instances with intersection can be reduced to union instances by 
dualization [50]. 

— For brevity, we are considering a global domain D, while in many applica- 
tions the domain is procedure-specific. This does not affect the generality of 
our approach and our algorithms remain correct for the general case where 
each procedure has its own dedicated domain. Indeed, our implementation 
supports the general case. 


Succinet representations. A distributive function f : 2? — 2? can be succinctly 
represented by a relation Rf C (DU {0}) x (DU {0}) defined as: 


Given that f is distributive over union, we have f({d1,...,dx}) = f({di})U---U 
f({d,}). Hence, to specify f it is sufficient to specify f(0) and f({d}) for each 
d € D. This is exactly what Rp does. In short, we have: f(0) ={b € D | (0,6) € 
Ry} and f({d}) = f(0)U{be D | (d,b) E€ Rf}. Moreover, we can represent the 
relation Ry as a bipartite graph Hy in which each part consists of the vertices 
DU {0} and Ry is the set of edges. For brevity, we define D* := D U {0}. 


Oa bd Oa b Oa b Oa b Oa b 
0 ; b 0 y b Oa b 0 a b 0 a b 
Mlad Aneu tee ArzU{a} Ar. aa 


Fig. 3: Succinct representation of several distributive functions. 


Example 3. Let D = {a,b}. Figure |3| provides several examples of bipartite 
graphs representing distributive functions. 


Bounded Bandwidth Assumption. Following [50], we assume that the bandwidth 
in function calls and returns is bounded by a constant. In other words, there is 
a small constant b, such that for every edge e that is a call-to-start or exit-to- 
return-site edge, every vertex in the graph representation H m(e) has degree b or 
less. This is a classical assumption in IFDS [507] and models the fact that every 
parameter in a called function is only dependent on a few variables in the callee 
(and conversely, every returned value is only dependent on a few variables in the 
called function). 
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Composition of distributive functions. Let f and g be distributive functions and 
Ry and Rọ their succinct representations. It is easy to verify that go f is also 
distributive, hence it has a succinct representation Rgof. Moreover, we have 
Roop = Rf; Rg ={(a,6) | de (a,c) € Rf A (c,d) € Rg}. 


Oa b 
: Oa b 
Ax.x U {a} ° 
“[Oa=0 v. Oa b 
Oa b 


Fig. 4: Obtaining Hof (right) from Hy and H, (left) 


Example 4. In terms of graphs, to compute Hgof, we first take Hr and Hy, then 
contract corresponding vertices in the lower part of Hy and the upper part of 
H,, and finally compute reachability from the topmost part to the bottommost 
part of the resulting graph. Consider f(x) = x U {a}, g(x) = {a} for x 4 0 
and g(0) = @, then go f(x) = {a} for all x C D. Figure [4] shows contracting 
of corresponding vertices in Hy and H, (left) and using reachability to obtain 
go (right). 


Exploded supergraph. Given an IFDS instance I = (G, D, F, M,U) with super- 
graph G = (V, E), its exploded supergraph G is obtained by taking |D*| copies of 
each vertex in V, one corresponding to each element of D*, and replacing each 
edge e with the graph representation H m(e) of the flow function M (e). Formally, 
G = (V, E) where V = V x D* and 


E = {((u, di), (v, d2)) | e = (u,v) € E A (di, dz) E Rue} r 


A path P in G is (same-context) valid, if the path P in G, obtained by ignoring 
the second component of every vertex in P, is (same-context) valid. As shown 
in [50], for a data flow fact d € D and a vertex v € V, we have d € MVP, iff 
there is a valid path in G from (Smain, d’) to (v, d) for some d’ € DU {0}. Hence, 
the IFDS problem is reduced to reachability by valid paths in G. Similarly, the 
same-context IFDS problem is reduced to reachability by same-context valid 
paths in G. 


Example 5. Consider a null pointer analysis on the program in Figure] At each 
program point, we want to know which pointers can potentially be null. We first 
model this problem as an IFDS instance. Let D = {%,y}, where g is the data 
flow fact that x might be null and y is defined similarly. Figure[5|shows the same 
program and its exploded supergraph. 

At point 8, the values of both pointers x and y are used. Hence, if either of 
x or y is null at 8, a null pointer error will be raised. However, as evidenced by 
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the two valid paths shown in red, both x and y might be null at 8. The pointer 
y might be null because it is passed to the function f by value (instead of by 
reference) and keeps its local value in the transition from c7 to r7, hence the 
edge ((c7, 9), (r7,9)) is in G. On the other hand, the function f only initializes 
y, which is its own local variable, and does not change x (which is shared with 
main). 


1 void f(int *&x, int *y) { 

2 y = new int(1); "6 
3 y = new int (2); cr 
4 } 

5 int main() { 

6 int *x, *y; 

7 f(x,y); T7 
8 *x += *y; 

9 } Ug 


Ug 


Fig. 5: A Program (left) and its Exploded Supergraph (right). 


2.2 Trees and Tree Decompositions 


Trees. A rooted tree T = (Vr, Er) is an undirected graph with a distinguished 
“root” vertex r € Vr, in which there is a unique path P? between every pair 
{u,v} of vertices. We refer to the number of vertices in Vr as the size of T. For 
an arbitrary vertex v € Vr, the depth of v, denoted by dy, is defined as the length 
of the unique path P? : r ~~ v. The depth or height of T is the maximum depth 
among its vertices. A vertex u is called an ancestor of v if u appears in P7. In 
this case, v is called a descendant of u. In particular, r is an ancestor of every 
vertex and each vertex is both an ancestor and a descendant of itself. We denote 
the set of ancestors of v by At and its descendants by Dt. It is straightforward 
to see that for every 0 < d < dv, the vertex v has a ae ancestor with depth 
d. We denote this ancestor by af. The ancestor p, = a97! of v at depth dẹ — 1 
is called the parent of v and v is a child of py. The aabt T} corresponding to 
v is defined as T[D}] = (Dt, Er N 2D% v), i.e. the part of T that consists of v and 
its descendants. Finally, a vertex v € Vr is called a leaf if it has no children. 
Given two vertices u,v € Vr, the lowest common ancestor Ica(u, v) of u and v is 
defined as argmax,,<atpqt dw- In other words, Ica(u, v) is the common ancestor 
of u and v with maximum depth, i.e. which is farthest from the root. 
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Lemma 1 ([35]). Given a rooted tree T of size n, there is an algorithm that 
preprocesses T in O(n) and can then answer lowest common ancestor queries, 
i.e. queries that provide two vertices u and v and ask for Ica(u,v), in O(1). 


Tree decompositions [52]. Given a graph G = (V, E), a tree decomposition of G 
is a rooted tree T = (%, Er) such that: 

(i) Each vertex b € B of T has an associated subset V(b) C V of vertices of 
G and Uses V(b) = V. For clarity, we call each vertex of T a “bag” and 
reserve the word vertex for G. Informally, each vertex must appear in some 
bag. 

(ii) For all (u,v) € E, there exists a bag b € B such that u,v € V(b), i.e. every 
edge should appear in some bag. 

(iii) For any pair of bags b;,b; € B and any bag bx that appears in the path 
P : bi ~ bj, we have V(b;) NV(b;) C V(b), i.e. each vertex should appear 
in a connected subtree of T. 

The width of the tree decomposition T = (8, Er) is defined as the size of its 

largest bag minus 1. The treewidth tw(G) of a graph G is the minimal width 

among its tree decompositions. A vertex v € V appears in a connected subtree, 

so there is a unique bag b with the smallest possible depth such that v € V(b). 

We call b the root bag of v and denote it by rb(v). 


{v2, U6, U7} | ba 


Cy GLY +» 


Fig. 6: A Graph G (left) and its Tree Decomposition T (right). 


It is well-known that flow graphs of programs have typically small treewidth [63]. 
For example, programs written in Pascal, C, and Solidity have treewidth at most 
3, 6 and 9, respectively. This property has also been confirmed experimentally 
for programs written in Java [34], C [38] and Ada [12]. The challenge is thus to 
exploit treewidth for faster interprocedural on-demand analyses. The first step 
in this approach is to compute tree decompositions of graphs. As the follow- 
ing lemma states, tree decompositions of low-treewidth graphs can be computed 
efficiently. 


Lemma 2 ([II]). Given a graph G with constant treewidth t, a binary tree 
decomposition of size O(n) bags, height O(log n) and width O(t) can be computed 
in linear time. 


Separators [20]. The key structural property that we exploit in low-treewidth 
flow graphs is a separation property. Let A,B C V. The pair (A, B) is called a 
separation of G if (i) AU B = V, and (ii) no edge connects a vertex in A — B 
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to a vertex in B — A or vice versa. If (A, B) is a separation, the set AN B is 
called a separator. The following lemma states such a separation property for 
low-treewidth graphs. 


Lemma 3 (Cut Property [26]). Let T = (B, Er) be a tree decomposition 
of G = (V,E) and e = {b,b'} € Er. If we remove e, the tree T breaks into 
two connected components, T? and Pr, respectively containing b and b'. Let 
A = Ucro V(t) and B = Uero V(t). Then (A, B) is a separation of G and its 
corresponding separator is AN B =V(b) NA V(b'). 


Example 6. Figure|6|]shows a graph and one of its tree decompositions with width 
2. In this example, we have rb(vs) = b1, rb(v3) = b2, rb(v4) = bz, and rb(v7) = b4. 
For the separator property of Lemma [B| consider the edge {b2, b4}. By removing 
it, T breaks into two parts, one containing the vertices A = {v1, v2, V3, V4, Us} 
and the other containing B = {v2,v6,v7}. We have AN B = {v2} = V(b2) A 
V(b4). Also, any path from B — A = {v6, v7} to A — B = {v1, v3, v4, U5} or vice 
versa must pass through {v2}. Hence, (A, B) is a separation of G with separator 
V (b2) N V (b4) = {vg}. 


3 Problem definition 


We consider same-context IFDS problems in which the flow graphs G; have a 
treewidth of at most t for a fixed constant t. We extend the classical notion of 
same-context IFDS solution in two ways: (i) we allow arbitrary start points for 
the analysis, i.e. we do not limit our analyses to same-context valid paths that 
start at Smain; and (ii) instead of a one-shot algorithm, we consider a two-phase 
process in which the algorithm first preprocesses the input instance and is then 
provided with a series of queries to answer. We formalize these points below. We 
fix an IFDS instance I = (G, D, F, M, U) with exploded supergraph G = (V, F). 
Meet over same-contezxt valid paths. We extend the definition of MSCP by spec- 
ifying a start vertex u and an initial set A of data flow facts that hold at u. 
Formally, for any vertex v that is in the same flow graph as u, we define: 


MSCPa aw = |] pfp(A). (2) 
PESCVP(u,v) 


The only difference between and is that in (i), the start vertex u is fixed 
aS Smain and the initial data-fact set A is fixed as D, while in p), they are free 
to be any vertex/set. 


Reduction to reachability. As explained in Section [2.1] computing MSCP is re- 
duced to reachability via same-context valid paths in the exploded supergraph 
G. This reduction does not depend on the start vertex and initial data flow facts. 
Hence, for a data flow fact d € D, we have d € MSCP,,a., iff in the exploded 
supergraph G the vertex (v,d) is reachable via same-context valid paths from 
a vertex (u,6) for some 6 € AU {0}. Hence, we define the following types of 
queries: 


Optimal and Parallel On-demand Data-flow Analysis 125 


Pair query. A pair query provides two vertices (u, dı) and (v, d2) of the exploded 
supergraph G and asks whether they are reachable by a same-context valid path. 
Hence, the answer to a pair query is a single bit. Intuitively, if d2 = 0, then 
the query is simply asking if v is reachable from u by a same-context valid 
path in G. Otherwise, dz is a data flow fact and the query is asking whether 
d2 € MSCP yy, {a1 }nD,v- 

Single-source query. A single-source query provides a vertex (u, dı) and asks for 
all vertices (v, d2) that are reachable from (u, d1) by a same-context valid path. 
Assuming that u is in the flow graph G; = (Vj, Ei), the answer to the single source 
query is a sequence of |V;| - |D*| bits, one for each (v, dz) € V; x D*, signifying 
whether it is reachable by same-context valid paths from (u, dı). Intuitively, a 
single-source query asks for all pairs (v,d2) such that (i) v is reachable from u 
by a same-context valid path and (ii) dz € MSCP,, {a }nD,wv U {0}. 

Intuition. We note the intuition behind such queries. We observe that since the 
functions in F are distributive over U, we have MSCP4,A,v = Usea MSCP, {8},v; 
hence MSCP,,a,» can be computed by O(|A]) single-source queries. 


4 ‘'Treewidth-based Data-flow Analysis 


4.1 Preprocessing 


The original solution to the IFDS problem, as first presented in [50], reduces 
the problem to reachability over a newly constructed graph. We follow a sim- 
ilar approach, except that we exploit the low-treewidth property of our flow 
graphs at every step. Our preprocessing is described below. It starts with com- 
puting constant-width tree decompositions for each of the flow graphs. We then 
use standard techniques to make sure that our tree decompositions have a nice 
form, i.e. that they are balanced and binary. Then comes a reduction to reacha- 
bility, which is similar to . Finally, we precompute specific useful reachability 
information between vertices in each bag and its ancestors. As it turns out in 
the next section, this information is sufficient for computing reachability between 
any pair of vertices, and hence for answering IFDS queries. 


Overview. Our preprocessing consists of the following steps: 

(1) Finding Tree Decompositions. In this step, we compute a tree decom- 
position T; = (%;, Er, ) of constant width t for each flow graph G;. This can 
either be done by applying the algorithm of [I0] directly on G;, or by using 
an algorithm due to Thorup [63] and parsing the program. 

(2) Balancing and Binarizing. In this step, we balance the tree decomposi- 
tions T; using the algorithm of Lemma [2] and make them binary using the 
standard process of [22]. 

(3) LCA Preprocessing. We preprocess the T;’s for answering lowest common 
ancestor queries using Lemma [I] 

(4) Reduction to Reachability. In this step, we modify the exploded super- 
graph G = (V, E) to obtain a new graph G= (V, Ê), such that for every 
pair of vertices (u, dı) and (v, d2), there is a path from (u, dı) to (v, d2) in 
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G iff there is a same-contezt valid path from (u,d,) to (v, dz) in G. So, this 
step reduces the problem of reachability via same-context valid paths in G 
to simple reachability in G. 

(5) Local Preprocessing. In this step, for each pair of vertices (u,d,) and 
(v, dz) for which there exists a bag b such that both u and v appear in b, we 
compute and cache whether (u, dı) ~> (v,d2) in G. We write (u, d1) local 
(v, d2) to denote a reachability established in this step. 

(6) Ancestors Reachability Preprocessing. In this step, we compute reach- 
ability information between each vertex in a bag and vertices appearing in 
its ancestors in the tree decomposition. Concretely, for each pair of vertices 
(u, dı) and (v, d2) such that u appears in a bag b and v appears in a bag b’ 
that is an ancestor of b, we establish and remember whether (u, d1) ~> (v, d2) 
in G and whether (v, d2) ~> (u, dı) in G. As above, we use the notations 
(u, d1) ~ane (v, d2) and (v, d2) ~anc (u, d1). 

Steps (1)-(3) above are standard and well-known processes. We now provide 
details of steps (4)—(6). To skip the details and read about the query phase, see 

Section below. 


Step (4): Reduction to Reachability 


In this step, our goal is to compute a new graph G from the exploded supergraph 
G such that there is a path from (u, dı) to (v, d2) in G iff there is a same-context 
valid path from (u, dı) to (v,dz) in G. The idea behind this step is the same as 
that of the tabulation algorithm in [50]. 


Summary edges. Consider a call vertex cı in G and its corresponding return-site 
vertex rı. For d,,dz € D*, the edge ((cı, d1), (rı, d2)) is called a summary edge 
if there is a same-context valid path from (cı, dı) to (rı, d2) in the exploded 
supergraph G. Intuitively, a summary edge summarizes the effects of procedure 
calls (same-context interprocedural paths) on the reachability between c; and 
rı. From the definition of summary edges, it is straightforward to verify that the 
graph G' obtained from G by adding every summary edge and removing every 
interprocedural edge has the desired property, i.e. a pair of vertices are reachable 
in G iff they are reachable by a same-context valid path in G. Hence, we first 
find all summary edges and then compute G. This is shown in Algorithm 

We now describe what Algorithm |1| does. Let sp be the start point of a 
procedure p. A shortcut edge is an edge ((sp, di), (v, d2)) such that v is in the 
same procedure p and there is a same-context valid path from (sp, d1) to (v, dz) in 
G. The algorithm creates an empty graph H = (V, E’). Note that H is implicitly 
represented by only saving E’. It also creates a queue Q of edges to be added to 
H (initially Q = E) and an empty set S which will store the summary edges. 
The goal is to construct H such that it contains (i) intraprocedural edges of G, 
(ii) summary edges, and (iii) shortcut edges. 

It constructs H one edge at a time. While there is an unprocessed intrapro- 
cedural edge e = ((u, dı), (v, d2)) in Q, it chooses one such e and adds it to H 
(lines 5-10). Then, if (u, dı) is reachable from (s,,d3) via a same-context valid 
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Algorithm 1: Computing G in Step (4) 
1ı QE; 

2 S+ ģ; 

3 FE' +b; 

4 while Q 40 do 


5 Choose e = ((u, d1), (v, d2)) € Q; 

6 QQ- {e}; 

7 if (u,v) is an interprocedural edge, i.e. a call-to-start or exit-to-return-site 

edge then 
| continue; 

9 p + the procedure s.t. u,v € Vp; 
10 E' + F'U {e}; 
11 foreach ds s.t. ((sp, d3), (u, d1)) € E’ do 

12 if ((sp, d3), (v, d2)) ¢ E’ U Q then 

13 Q<— QU {((sp, ds), (v, d2))}; 
14 if u = sp and v = ep then 

15 foreach (cı, d3) s.t. ((cı, d3), (u, d1)) € E do 
16 foreach d4 s.t. ((v, d2), (rı, d4)) € E do 
17 if ((cı, d3), (rı, d4)) ¢ E'UQ then 
18 S + SU {((a, ds), (rı, d4))}; 

19 Q & QU {((c1, ds), (rı, da))}; 
20 Ĉĉ- G; 
21 foreach e = ((u, di), (v, d2)) € E do 
22 if u and v are not in the same procedure then 
23 | G=G— {e}; 
24 GeGuS; 


path, then by adding the edge e, the vertex (v, d2) also becomes accessible from 
(Sp, d3). Hence, it adds the shortcut edge ((sp, d3), (v, d2)) to Q, so that it is later 
added to the graph H. Moreover, if u is the start sp of the procedure p and v is 
its end ep, then for every call vertex c calling the procedure p and its respective 
return-site r;, we can add summary edges that summarize the effect of calling p 
(lines 14-19). Finally, lines 20-24 compute G as discussed above. 


Correctness. As argued above, every edge that is added to H is either intrapro- 
cedural, a summary edge or a shortcut edge. Moreover, all such edges are added 
to H, because H is constructed one edge at a time and every time an edge e 
is added to H, all the summary/shortcut edges that might occur as a result 
of adding e to H are added to the queue Q and hence later to H. Therefore, 
Algorithm [i] correctly computes summary edges and the graph CG. 


Complexity. Note that the graph H has at most O(|F| - |D*|?) edges. Addition 
of each edge corresponds to one iteration of the while loop at line 4 of Algo- 
rithm [1] Moreover, each iteration takes O(|D*|) time, because the loop at line 
11 iterates over at most |D*| possible values for dz and the loops at lines 15 
and 16 have constantly many iterations due to the bounded bandwidth assump- 
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tion (Section [2.1p. Since |D*| = O(|D]) and |E| = O(n), the total runtime of 
Algorithm |1}is O(|n| - |D|3). For a more detailed analysis, see [50] Appendix]. 


Step (5): Local Preprocessing 


In this step, we compute the set Rigcay of local reachability edges, i.e. edges 
of the form ((u, dı), (v, d2)) such that u and v appear in the same bag b of a 
tree decomposition T; and (u, dı) ~> (v, d2) in Ĝ. We write (u, di) tocal (V, d2) 
to denote ((u, d1), (v, d2)) € Riocai: Note that G has no interprocedural edges. 
Hence, we can process each T; separately. We use a divide-and-conquer technique 
similar to the kernelization method used in [22] (Algorithm [2}. 

Algorithm [| processes each tree decomposition T; separately. When process- 
ing T, it chooses a leaf bag b; of T and computes all-pairs reachability on the 
induced subgraph H; = Ĝ[V (bı) x D*], consisting of vertices that appear in b. 
Then, for each pair of vertices (u, dı) and (v, d2) s.t. u and v appear in b; and 
(u, di) ~> (v,d2) in Hj, the algorithm adds the edge ((u, d1), (v,d2)) to both 
Riocal and G (lines 7-9). Note that this does not change reachability relations in 
G, given that the vertices connected by the new edge were reachable by a path 
before adding it. Then, if b is not the only bag in T, the algorithm recursively 
calls itself over the tree decomposition T’— by, i.e. the tree decomposition obtained 
by removing b; (lines 10-11). Finally, it repeats the reachability computation on 
H; (lines 12-14). The running time of the algorithm is O(n - |D*|?). 


Algorithm 2: Local Preprocessing in Step (5) 


1 Roca + O; 

2 foreach T; do 

3 computeLocalReachability (7; ); 

4 Function computeLocalReachability(7) 

5 Choose a leaf bag bı of T; 

6 bp + parent of bı; 

7 | foreach u,v €V(bi), di,dz € D* s.t. (u,di) ~ (v, d2) in G[V(b1) x D*] 

do 

8 G = GU {((u, d1), (v, d2))}; 

9 Riocal = Riocai U {((u, d1), (v, d2))}; 
10 if bp A null then 

11 computeLocalReachability(T — bı); 

12 foreach u,v € V(b), di,dz E€ D* s.t. (u,di) ~ (v,d2) in 

Ĝ[V (b1) x D*] do 
13 Ĝ = ĜU {((u, di), (v, d2))}; 
14 Riocat = Riocai U {((u, d1), (v, d2))}; 


Example 7. Consider the graph G and tree decomposition T given in Figure [6] 
and let D* = {0}, i.e. let G and G be isomorphic to G. Figure [7 illustrates the 
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steps taken by Algorithm |2| In each step, a bag is chosen and a local all-pairs 
reachability computation is performed over the bag. Local reachability edges are 
added to Ricca and to G (if they are not already in G). 


We now prove the correctness and establish the complexity of Algorithm [2] 


Correctness. We prove that when computeLocalReachability(T) ends, the set Riocat 
contains all the local reachability edges between vertices that appear in the 
same bag in T. The proof is by induction on the size of T. If T consists of a 
single bag, then the local reachability computation on H; (lines 7-9) fills Rigcal 
correctly. Now assume that T has n bags. Let H_; = G[Uo, er iV (bi) XD 
Intuitively, H_, is the part of G that corresponds to other bags in T, i.e. every 
bag except the leaf bag b. After the local reachability computation at lines 7— 
9, (v,dz) is reachable from (u,d;) in H_; only if it is reachable in G. This is 
because (i) the vertices of H; and H_; form a separation of G with separator 
(V (bi) NV (bp)) x D* (Lemma [3) and (ii) all reachability information in H; is 
now replaced by direct edges (line 8). Hence, by induction hypothesis, line 11 
finds all the local reachability edges for T — b; and adds them to both Rigca; and 
G. Therefore, after line 11, for every u,v € V(by), we have (u,d,) ~> (v, d2) in 
H; iff (u,d,) ~> (v, d2) in G. Hence, the final all-pairs reachability computation 
of lines 12-14 adds all the local edges in b; to Riocat. 

Complexity. Algorithm [2] performs at most two local all-pair reachability com- 
putations over the vertices appearing in each bag, i.e. O(t- |D*|) vertices. Each 
such computation can be performed in O(t? - |D*|?) using standard reachabil- 
ity algorithms. Given that the T;’s have O(n) bags overall, the total runtime of 
Algorithm [2] is O(n - t? -|D*|3) = O(n- |D*|%). Note that the treewidth t is a 
constant and hence the factor t? can be removed. 


Step (6): Ancestors Reachability Preprocessing 


This step aims to find reachability relations between each vertex of a bag and 
vertices that appear in the ancestors of that bag. As in the previous case, we 
compute a set Ranc and write (u,d1) ~anc (V, d2) if ((u, di), (v, d2)) € Ranc- 

This step is performed by Algorithm [3] For each bag b and vertex (u, d) such 
that u € V(b) and each 0 < j < dy, we maintain two sets: F(u,d,b, j) and 
F'(u,d,b,j) each containing a set of vertices whose first coordinate is in the 
ancestor of b at depth j. Intuitively, the vertices in F'(u,d,b,j) are reachable 
from (u,d). Conversely, (u,d) is reachable from the vertices in F’(u,d,b,7). At 
first all F and F” sets are initialized as Ø. We process each tree decomposition 
T; in a top-down manner and does the following actions at each bag: 


— If a vertex u appears in both b and its parent bp, then the reachability data 
computed for (u, d) at bp can also be used in b. So, the algorithm copies this 
data (lines 4-7). 

— If (u,d1) ~tocal (V, d2), then this reachability relation is saved in F and F’ 
(lines 10-11). Also, any vertex that is reachable from (v, d2) is reachable from 
(u, dı), too. So, the algorithm adds F (v, do, b, j) to F(u, di, b, j) (line 13). The 
converse happens to F” (line 14). 
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b b: 

Ck) ——> (FY) —— 

Fi {v2, U6; v7} ‘iG i {v3, U4; us} 
) 


ba by 
= 
{v2, U3, U5} {U1, V2, Us} 
by bs 
== 
{v2, U3, vs} {v3, V4; us} 
bg 
{v», U6; v7} 


Fig. 7: Local Preprocessing (Step 5) on the graph and decomposition of Figure [o] 


After the execution of Algorithm[3] we have (v, d2) € F(u, di, b, j) iff (i) (v, d2) 
is reachable from (u, dı) and (ii) u € V(b) and v € V(aj), i.e. v appears in the 
ancestor of b at depth j. Conversely, (u, d1) € F’ (v, d2, b, 7) iff (i) (v, dz) is reach- 
able from (u, dı) and (ii) v € V(b) and u € V(aj). Ea a runtime of 
O(n- |D]? - logn). See for detailed proofs. In the next section, we show that 
this runtime can be reduced to O(n: |D|?) using word tricks. 


4.2 Word Tricks 
We now show how to reduce the time complexity of Algorithm |3] from O(n - 


|D*| -logn) to O(n-|D*|°) using word tricks. The idea is to pack the F and F” 
sets of Algorithm [3] into words, i.e. represent them by a binary sequence. 
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Algorithm 3: Ancestors Preprocessing in Step (6) 


1 foreach T; = (Bj, Er,) do 

2 foreach b € %; in top-down order do 

3 bp + parent of b; 

4 foreach u € V(b) N V (bp), d € D* do 
5 foreach 0 < j < dẹ do 

6 F(u,d,b, j) 4+ F(u,d, bp, 7); 

7 F'(u,d,b, j) — F'(u,d, bp, j); 
8 foreach u,v € V(b), d1, d2 € D* do 
9 if (u, d1) rcar (v, d2) then 


10 F(u, dı, b, dy) — F(u, di, b, dy) U { (v, d2)}; 

11 F' (v, d2,b, db) < F” (v, d2, b, db) U { (u, di) }; 

12 foreach 0 < j < d do 

13 F(u, dı,b, j) + F(u, di, b,j) U F(v, d2,b, j); 
14 F'(v,d2,b, j) — F' (v, do, b,j) U F' (u, di, b, j) 


15 Rane {((u, dı), (v, d2)) | 3b, j (v, d2) € F(u, di, b,j) V (u, d1) € F'(v, do, b,j) }; 


Given a bag b, we define 5, as the sum of sizes of all ancestors of b. The tree 
decompositions are balanced, so b has O(logn) ancestors. Moreover, the width 
is t, hence ô = O(t - logn) = O(logn) for every bag b. We perform a top-down 
pass of each tree decomposition T; and compute ô, for each b. 

For every bag b, u € V(b) and dı € D*, we store F(u,di,b,—) as a binary 
sequence of length 6,-|D*|. The first |V(b)|-|D*| bits of this sequence correspond 
to F(u, di,b,d,). The next |V(b,)|-|D*| correspond to F(u, d1, b, dẹ — 1), and so 
on. We use a similar encoding for F”. Using this encoding, Algorithm [3] can be 
rewritten by word tricks and bitwise operations as follows: 

— Lines 5-6 copy F(u, d, bp, —) into F(u, d, b, —). However, we have to shift and 
align the bits, so these lines can be replaced by 


F(u, d,b, —) — F(u,d, bp, —) < |V (b)| - |D*|; 


— Line 10 sets a single bit to 1. 
— Lines 12-13 perform a union, which can be replaced by the bitwise OR 
operation. Hence, these lines can be replaced by 


F(u,dı,b,—) + F(u,di,b,-—) OR F(v, do, b, —); 


— Computations on F’ can be handled similarly. 


Note that we do not need to compute Ranc explicitly given that our queries 
can be written in terms of the F and F” sets. It is easy to verify that using these 
word tricks, every W operations in lines 6, 7, 13 and 14 are replaced by one or 
two bitwise operations on words. Hence, the overall runtime of Algorithm [3] is 


reduced to O (oP fee = O(n« |D*/*). 
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4.3 Answering Queries 


We now describe how to answer pair and single-source queries using the data 
saved in the preprocessing phase. 


Answering a Pair Query. Our algorithm answers a pair query from a vertex 
(u, dı) to a vertex (v, d2) as follows: 
(i) If u and v are not in the same flow graph, return 0 (no). 
(ii) Otherwise, let G; be the flow graph containing both u and v. Let b, = rb(w) 
and b, = rb(v) be the root bags of u and v in T; and let b = Ica(by, by). 
(iii) If there exists a vertex w € V(b) and da E€ D* such that (u, di) ~anc (w, d3) 
and (w,d3) anc (v, d2), return 1 (yes), otherwise return 0 (no). 


Correctness. If there is a path P : (u,d,) ~> (v, d2), then we claim P must pass 
through a vertex (w, d3) with w € V(b). If b = b, or b = by, the claim is obviously 
true. Otherwise, consider the path P’ : b, ~> b, in the tree decomposition T;. 
This path passes through b (by definition of b). Let e = {b, b’} be an edge of P’. 
Applying the cut property (Lemma [3} to e, proves that P must pass through a 
vertex (w,d3) with w € V(b!) M V(b). Moreover, b is an ancestor of both b, and 
by, hence we have (u, d1) ~anc (w,d3) and (w, d3) ~anc (V, dọ). 


Complexity. Computing LCA takes O(1) time. Checking all possible vertices 
(w,d3) takes O(t- |D*|) = O(| D|). This runtime can be decreased to O (| [DI ) 


logn 


by word tricks. 


Answering a Single-source Query. Consider a single-source query from a vertex 
(u, di) with u € V;. We can answer this query by performing |V;| x |D*| pair 
queries, i.e. by performing one pair query from (u, dı) to (v, d2) for each v € V; 


and dz € D*. Since |D*| = O(| D|), the total complexity is O (iv -|D|- | 24) 


for answering a single-source query. Using a more involved preprocessing method, 
ae 2 2 

we can slightly improve this time to O (42e) . See for more details. Based 

on the results above, we now present our main theorem: 


Theorem 1. Given an IFDS instance I = (G, D, F, M,U), our algorithm pre- 


processes I in time O(n- |D|?) and can then answer each pair query and single- 
source query in time 


D -|DI? 
O (| IP| }) and O € [9] ) , respectively. 
logn logn 


4.4 Parallelizability and Optimality 


We now turn our attention to parallel versions of our query algorithms, as well 
as cases where the algorithms are optimal. 


Parallelizability. Assume we have k threads in our disposal. 
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1. Given a pair query of the form (u,d1,v,d2), let bu (resp. by) be the root 
bag u (resp. v), and b = lIca(b,, by) the lowest common ancestor of b, and 
by. We partition the set V(b) x D* into k subsets {Aj}i<i<z. Then, thread 
i handles the set A;, as follows: for every pair (w,d3) € A;, the thread sets 
the output to 1 (yes) iff (u, d1) ~anc (w, d3) and (w, d3) ~anc (v, d2). 

2. Recall that a single source query (u, dı) is answered by breaking it down to 
|V;| x |D*| pair queries, where G; is the flow graph containing u. Since all 
such pair queries are independent, we parallelize them among k threads, and 
further parallelize each pair query as described above. 


With word tricks, parallel pair and single-source queries require O (| 2 =|) 


and O (| wet ]) time, respectively. Hence, for large enough k, each query re- 


k-logn 

quires only O(1) time, and we achieve perfect parallelism. 

Optimality. Observe that when |D| = O(1), i.e. when the domain is small, our 
algorithm is optimal: the preprocessing runs in O(n), which is proportional to 
the size of the input, and the pair query and single-source query run in times 
O(1) and O(n/logn), respectively, each case being proportional to the size of 
the output. Small domains arise often in practice, e.g. in dead-code elimination 
or null-pointer analysis. 


5 Experimental Results 


We report on an experimental evaluation of our techniques and compare their 
performance to standard alternatives in the literature. 


Benchmarks. We used 5 classical data-flow analyses in our experiments, including 
reachability (for dead-code elimination), possibly-uninitialized variables analy- 
sis, simple uninitialized variables analysis, liveness analysis of the variables, and 
reaching-definitions analysis. We followed the specifications in [86] for model- 
ing the analyses in IFDS. We used real-world Java programs from the DaCapo 
benchmark suite [6], obtained their flow graphs using Soot [65] and applied the 
JTDec tool for computing balanced tree decompositions. Given that some 
of these benchmarks are prohibitively large, we only considered their main Java 
packages, i.e. packages containing the starting point of the programs. We ex- 
perimented with a total of 22 benchmarks, which, together with the 5 analyses 
above, led to a total of 110 instances. Our instance sizes, i.e. number of vertices 
and edges in the exploded supergraph, range from 22 to 190,591. See [I7] for 
details. 


Implementation and comparison. We implemented both variants of our approach, 
i.e. sequential and parallel, in C++. We also implemented the parts of the clas- 
sical IFDS algorithm [50] and its on-demand variant [36] responsible for same- 
context queries. All of our implementations closely follow the pseudocodes of 
our algorithms and the ones in [50/36], and no additional optimizations are ap- 
plied. We compared the performance of the following algorithms for randomly- 
generated queries: 
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— SEQ. The sequential variant of our algorithm. 

— PAR. A variant of our algorithm in which the queries are answered using 
perfect parallelization and 12 threads. 

— NOPP. The classical same-context IFDS algorithm of [50], with no prepro- 
cessing. NOPP performs a complete run of the classic IFDS algorithm for 
each query. 

— OPP. The classical same-context IFDS algorithm of [50], with complete pre- 
processing. In this algorithm, all summary edges and reachability information 
are precomputed and the queries are simple table lookups. 

— OD. The on-demand same-context IFDS algorithm of [86]. This algorithm 
does not preprocess the input. However, it remembers the information ob- 
tained in each query and uses it to speed-up the following queries. 


For each instance, we randomly generated 10,000 pair queries and 100 single- 
source queries. In case of single-source queries, source vertices were chosen uni- 
formly at random. For pair queries, we first chose a source vertex uniformly at 
random, and then chose a target vertex in the same procedure, again uniformly 
at random. 


Experimental setting. The results were obtained on Debian using an Intel Xeon 
E5-1650 processor (3.2 GHz, 6 cores, 12 threads) with 128GB of RAM. The 
parallel results used all 12 threads. 


Time limit. We enforced a preprocessing time limit of 5 minutes per instance. 
This is in line with the preprocessing times of state-of-the-art tools on bench- 
marks of this size, e.g. Soot takes 2-3 minutes to generate all flow graphs for 
each benchmark. 
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Fig. 8: Preprocessing times of CPP and SEQ/PAR (over all instances). A dot 
above the 300s line denotes a timeout. 
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Results. We found that, except for the smallest instances, our algorithm consis- 
tently outperforms all previous approaches. Our results were as follows: 


Treewidth. The maximum width amongst the obtained tree decompositions 
was 9, while the minimum was 1. Hence, our experiments confirm the results 
of and show that real-world Java programs have small treewidth. 
See for more details. 

Preprocessing Time. As in Figure |8} our preprocessing is more lightweight 
and scalable than CPP. Note that CPP preprocessing times out at 25 of 
the 110 instances, starting with instances of size < 50,000, whereas our 
approach can comfortably handle instances of size 200,000. Although the 
theoretical worst-case complexity of CPP preprocessing is O(n? - |D|3), we 
observed that its runtime over our benchmarks grows more slowly. We believe 
this is because our benchmark programs generally consist of a large number 
of small procedures. Hence, the worst-case behavior of CPP preprocessing, 
which happens on instances with large procedures, is not captured by the 
DaCapo benchmarks. In contrast, our preprocessing time is O(n -| D|’) and 
having small or large procedures does not matter to our algorithms. Hence, 
we expect that our approach would outperform CPP preprocessing more 
significantly on instances containing large functions. However, as Figure 
demonstrates, our approach is faster even on instances with small procedures. 

Query Time. As expected, in terms of pair query time, NOPP is the worst per- 
former by a large margin, followed by OD, which is in turn extremely less 
efficient than CPP, PAR and SEQ (Figure [9 top). This illustrates the un- 
derlying trade-off between preprocessing and query-time performance. Note 
that both CPP and our algorithms (SEQ and PAR), answer each pair query 
in O(1). They all have pair-query times of less than a millisecond and are 
indistinguishable in this case. The same trade-off appears in single-source 
queries as well (Figure g bottom). Again, NOPP is the worst performer, 
followed by OD. SEQ and CPP have very similar runtimes, except that SEQ 
outperforms CPP in some cases, due to word tricks. However, PAR is ex- 
tremely faster, which leads to the next point. 

Parallelization. In Figure [9] (bottom right), we also observe that single-source 
queries are handled considerably faster by PAR in comparison with SEQ. 
Specifically, using 12 threads, the average single-source query time is re- 
duced by a factor of 11.8. Hence, our experimental results achieve near- 
perfect parallelism and confirm that our algorithm is well-suited for parallel 
architectures. 


Note that Figure[9]combines the results of all five mentioned data-flow analy- 
ses. However, the observations above hold independently for every single analysis, 
as well. See for analysis-specific figures. 
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Fig. 9: Comparison of pair query time (top row) and single source query time 
(bottom row) of the algorithms. Each dot represents one of the 110 instances. 
Each row starts with a global picture (left) and zooms into smaller time units 
(right) to differentiate between the algorithms. The plots above contain results 
over all five analyses. However, our observations hold independently for every 
single analysis, as well (See [I7]). 


6 Conclusion 


We developed new techniques for on-demand data-flow analyses in IFDS, by 
exploiting the treewidth of flow graphs. Our complexity analysis shows that our 
techniques (i) have better worst-case complexity, (ii) offer certain optimality 
guarantees, and (iii) are embarrassingly paralellizable. Our experiments demon- 
strate these improvements in practice: after a lightweight one-time preprocessing, 
queries are answered as fast as the heavyweight complete preprocessing, and the 
parallel speedup is close to its theoretical optimal. The main limitation of our ap- 
proach is that it only handles same-context queries. Using treewidth to speedup 
non-same-context queries is a challenging direction of future work. 


Optimal and Parallel On-demand Data-flow Analysis 137 


References 


17. 


18. 


19. 


. T. J. Watson libraries for analysis (WALA). https://github.com/wala/WALA 


(2003) 

Appel, A.W., Palsberg, J.: Modern Compiler Implementation in Java. Cambridge 
University Press, 2nd edn. (2003) 

Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A., Klein, J., Le Traon, Y., 
Octeau, D., McDaniel, P.: FlowDroid: Precise context, flow, field, object-sensitive 
and lifecycle-aware taint analysis for android apps. In: PLDI. pp. 259-269 (2014) 
Babich, W.A., Jazayeri, M.: The method of attributes for data flow analysis. Acta 
Informatica 10(3) (1978) 

Bebenita, M., Brandner, F., Fahndrich, M., Logozzo, F., Schulte, W., Tillmann, N., 
Venter, H.: Spur: A trace-based JIT compiler for CIL. In: OOPSLA. pp. 708-725 
(2010) 

Blackburn, S.M., Garner, R., Hoffman, C., Khan, A.M., McKinley, K.S., Bentzur, 
R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S.Z., Hirzel, M., Hosking, A., 
Jump, M., Lee, H., Moss, J.E.B., Phansalkar, A., Stefanović, D., VanDrunen, T., 
von Dincklage, D., Wiedermann, B.: The DaCapo benchmarks: Java benchmarking 
development and analysis. In: OOPSLA. pp. 169-190 (2006) 

Bodden, E.: Inter-procedural data-flow analysis with IFDS/IDE and soot. In: 
SOAP. pp. 3-8 (2012) 

Bodden, E., Tolêdo, T., Ribeiro, M., Brabrand, C., Borba, P., Mezini, M.: Spllift: 
Statically analyzing software product lines in minutes instead of years. In: PLDI. 
pp. 355-364 (2013) 

Bodlaender, H., Gustedt, J., Telle, J.A.: Linear-time register allocation for a fixed 
number of registers. In: SODA (1998) 


. Bodlaender, H.L.: A linear-time algorithm for finding tree-decompositions of small 


treewidth. SIAM Journal on computing 25(6), 1305-1317 (1996) 


. Bodlaender, H.L., Hagerup, T.: Parallel algorithms with optimal speedup for 


bounded treewidth. SIAM Journal on Computing 27(6), 1725-1746 (1998) 


. Burgstaller, B., Blieberger, J., Scholz, B.: On the tree width of ada programs. In: 


Ada-Europe. pp. 78-90 (2004) 


. Callahan, D., Cooper, K.D., Kennedy, K., Torczon, L.: Interprocedural constant 


propagation. In: CC (1986) 


. Chatterjee, K., Choudhary, B., Pavlogiannis, A.: Optimal dyck reachability for 


data-dependence and alias analysis. In: POPL. pp. 30:1-30:30 (2017) 


. Chatterjee, K., Goharshady, A., Goharshady, E.: The treewidth of smart contracts. 


In: SAC (2019) 


. Chatterjee, K., Goharshady, A.K., Goyal, P., Ibsen-Jensen, R., Pavlogiannis, A.: 


Faster algorithms for dynamic algebraic queries in basic RSMs with constant 
treewidth. ACM Transactions on Programming Languages and Systems 41(4) 
1-46 (2019) 

Chatterjee, K., Goharshady, A.K., Ibsen-Jensen, R., Pavlogiannis, A.: Optimal 
and perfectly parallel algorithms for on-demand data-flow analysis. arXiv preprint 
2001.11070 (2020) 

Chatterjee, K., Goharshady, A.K., Okati, N., Pavlogiannis, A.: Efficient parame- 
terized algorithms for data packing. In: POPL. pp. 1-28 (2019) 

Chatterjee, K., Goharshady, A.K., Pavlogiannis, A.: JTDec: A tool for tree decom- 
positions in soot. In: ATVA. pp. 59-66 (2017) 


138 


20. 


21. 


22. 


23. 
24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


38. 


39. 


40. 


41. 


42. 


K. Chatterjee et al. 


Chatterjee, K., Ibsen-Jensen, R., Goharshady, A.K., Pavlogiannis, A.: Algorithms 
for algebraic path properties in concurrent systems of constant treewidth com- 
ponents. ACM Transactions on Programming Langauges and Systems 40(3), 9 
(2018) 

Chatterjee, K., Ibsen-Jensen, R., Pavlogiannis, A.: Optimal reachability and a 
space-time tradeoff for distance queries in constant-treewidth graphs. In: ESA 
(2016) 

Chaudhuri, S., Zaroliagis, C.D.: Shortest paths in digraphs of small treewidth. part 
i: Sequential algorithms. Algorithmica 27(3-4), 212—226 (2000) 

Chaudhuri, S.: Subcubic algorithms for recursive state machines. In: POPL (2008) 
Chen, T., Lin, J., Dai, X., Hsu, W.C., Yew, P.C.: Data dependence profiling for 
speculative optimizations. In: CC. pp. 57-72 (2004) 

Cousot, P., Cousot, R.: Static determination of dynamic properties of recursive 
procedures. In: IFIP Conference on Formal Description of Programming Concepts 
(1977) 

Cygan, M., Fomin, F.V., Kowalik, L., Lokshtanov, D., Marx, D., Pilipczuk, M., 
Pilipczuk, M., Saurabh, S.: Parameterized algorithms, vol. 4 (2015) 

Duesterwald, E., Gupta, R., Soffa, M.L.: Demand-driven computation of interpro- 
cedural data flow. POPL (1995) 

Dutta, S.: Anatomy of a compiler. Circuit Cellar 121, 30-35 (2000) 

Flückiger, O., Scherer, G., Yee, M.H., Goel, A., Ahmed, A., Vitek, J.: Correctness 
of speculative optimizations with dynamic deoptimization. In: POPL. pp. 49:1- 
49:28 (2017) 

Giegerich, R., Möncke, U., Wilhelm, R.: Invariance of approximate semantics with 
respect to program transformations. In: ECI (1981) 

Gould, C., Su, Z., Devanbu, P.: Jdbc checker: A static analysis tool for SQL/JDBC 
applications. In: ICSE. pp. 697-698 (2004) 

Grove, D., Torczon, L.: Interprocedural constant propagation: A study of jump 
function implementation. In: PLDI (1993) 

Guarnieri, S., Pistoia, M., Tripp, O., Dolby, J., Teilhet, S., Berg, R.: Saving the 
world wide web from vulnerable javascript. In: ISSTA. pp. 177-187 (2011) 
Gustedt, J., Mæhle, O.A., Telle, J.A.: The treewidth of java programs. In: 
ALENEX. pp. 86-97 (2002) 

Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. 
SIAM Journal on Computing 13(2), 338-355 (1984) 

Horwitz, S., Reps, T., Sagiv, M.: Demand interprocedural dataflow analysis. ACM 
SIGSOFT Software Engineering Notes (1995) 

Hovemeyer, D., Pugh, W.: Finding bugs is easy. ACM SIGPLAN Notices 39(12), 
92-106 (Dec 2004) 

Klaus Krause, P., Larisch, L., Salfelder, F.: The tree-width of C. Discrete Applied 
Mathematics (03 2019) 

Knoop, J., Steffen, B.: The interprocedural coincidence theorem. In: CC (1992) 
Krüger, S., Späth, J., Ali, K., Bodden, E., Mezini, M.: CrySL: An Extensible 
Approach to Validating the Correct Usage of Cryptographic APIs. In: ECOOP. 
pp. 10:1-10:27 (2018) 

Lee, Y.f., Marlowe, T.J., Ryder, B.G.: Performing data flow analysis in parallel. 
In: ACM/IEEE Supercomputing. pp. 942-951 (1990) 

Lee, Y.F., Ryder, B.G.: A comprehensive approach to parallel data flow analysis. 
In: ICS. pp. 236-247 (1992) 


43. 


44. 


45. 


46. 


47. 


48. 
49. 


50. 


5l. 


52. 


53. 


54. 


55. 


56. 


57. 


58. 


59. 


60. 


61. 


62. 


63. 


64. 


65. 


66. 


67. 


Optimal and Parallel On-demand Data-flow Analysis 139 


Lin, J., Chen, T., Hsu, W.C., Yew, P.C., Ju, R.D.C., Ngai, T.F., Chan, S.: A com- 
piler framework for speculative optimizations. ACM Transactions on Architecture 
and Code Optimization 1(3), 247-271 (2004) 

Muchnick, S.S.: Advanced Compiler Design and Implementation. Morgan Kauf- 
mann (1997) 

Naeem, N.A., Lhoták, O., Rodriguez, J.: Practical extensions to the ifds algorithm. 
CC (2010) 

Nanda, M.G., Sinha, S.: Accurate interprocedural null-dereference analysis for java. 
In: ICSE. pp. 133-143 (2009) 

Rapoport, M., Lhotak, O., Tip, F.: Precise data flow analysis in the presence of 
correlated method calls. In: SAS. pp. 54-71 (2015) 

Reps, T.: Program analysis via graph reachability. ILPS (1997) 

Reps, T.: Undecidability of context-sensitive data-dependence analysis. ACM 
Transactions on Programming Languages and Systems 22(1), 162-186 (2000) 
Reps, T., Horwitz, S., Sagiv, M.: Precise interprocedural dataflow analysis via 
graph reachability. In: POPL. pp. 49-61 (1995) 

Reps, T.: Demand interprocedural program analysis using logic databases. In: Ap- 
plications of Logic Databases, vol. 296 (1995) 

Robertson, N., Seymour, P.D.: Graph minors. iii. planar tree-width. Journal of 
Combinatorial Theory, Series B 36(1), 49-64 (1984) 

Rodriguez, J., Lhoták, O.: Actor-based parallel dataflow analysis. In: CC. pp. 179- 
197 (2011) 

Rountev, A., Kagan, S., Marlowe, T.: Interprocedural dataflow analysis in the 
presence of large libraries. In: CC. pp. 2-16 (2006) 

Sagiv, M., Reps, T., Horwitz, S.: Precise interprocedural dataflow analysis with 
applications to constant propagation. Theoretical Computer Science (1996) 
Schubert, P.D., Hermann, B., Bodden, E.: PhASAR: An inter-procedural static 
analysis framework for C/C++. In: TACAS. pp. 393-410 (2019) 

Shang, L., Xie, X., Xue, J.: On-demand dynamic summary-based points-to analy- 
sis. In: CGO. pp. 264-274 (2012) 

Sharir, M., Pnueli, A.: Two approaches to interprocedural data flow analysis. In: 
Program flow analysis: Theory and applications. Prentice-Hall (1981) 
Smaragdakis, Y., Bravenboer, M., Lhoták, O.: Pick your contexts well: Under- 
standing object-sensitivity. In: POPL. pp. 17-30 (2011) 

Spath, J., Ali, K., Bodden, E.: Context-, flow-, and field-sensitive data-flow analysis 
using synchronized pushdown systems. In: POPL. pp. 48:1—48:29 (2019) 
Sridharan, M., Bodik, R.: Refinement-based context-sensitive points-to analysis for 
java. ACM SIGPLAN Notices 41(6), 387—400 (2006) 

Sridharan, M., Gopan, D., Shan, L., Bodik, R.: Demand-driven points-to analysis 
for java. In: OOPSLA. pp. 59-76 (2005) 

Thorup, M.: All structured programs have small tree width and good register 
allocation. Information and Computation 142(2), 159-181 (1998) 

Torczon, L., Cooper, K.: Engineering a Compiler. Morgan Kaufmann, 2nd edn. 
(2011) 

Vallée-Rai, R., Co, P., Gagnon, E., Hendren, L.J., Lam, P., Sundaresan, V.: Soot 
- a Java bytecode optimization framework. In: CASCON. p. 13 (1999) 

Xu, G., Rountev, A., Sridharan, M.: Scaling cfl-reachability-based points-to anal- 
ysis using context-sensitive must-not-alias analysis. In: ECOOP (2009) 

Yan, D., Xu, G., Rountev, A.: Demand-driven context-sensitive alias analysis for 
java. In: ISSTA. pp. 155-165 (2011) 


140 K. Chatterjee et al. 


68. Yuan, X., Gupta, R., Melhem, R.: Demand-driven data flow analysis for commu- 
nication optimization. Parallel Processing Letters 07(04), 359-370 (1997) 

69. Zheng, X., Rugina, R.: Demand-driven alias analysis for c. In: POPL. pp. 197-208 
(2008) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any 
medium or format, as long as you give appropriate credit to the original author(s) and 
the source, provide a link to the Creative Commons license and indicate if changes 
were made. 

The images or other third party material in this chapter are included in the chapter’s 
Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will need 
to obtain permission directly from the copyright holder. 


® 


Check for 
updates 


Concise Read-Only Specifications for 
Better Synthesis of Programs with Pointers 


Andreea Costea'®, Amy Zhu?@*, Nadia Polikarpova®®, and Ilya Sergey*!® 


1 School of Computing, National University of Singapore, Singapore 
2 University of British Columbia, Vancouver, Canada 
3 University of California, San Diego, USA 
4 Yale-NUS College, Singapore 


Abstract. In program synthesis there is a well-known trade-off between 
concise and strong specifications: if a specification is too verbose, it might 
be harder to write than the program; if it is too weak, the synthesised 
program might not match the user’s intent. In this work we explore the 
use of annotations for restricting memory access permissions in program 
synthesis, and show that they can make specifications much stronger 
while remaining surprisingly concise. Specifically, we enhance Synthetic 
Separation Logic (SSL), a framework for synthesis of heap-manipulating 
programs, with the logical mechanism of read-only borrows. 

We observe that this minimalistic and conservative SSL extension bene- 
fits the synthesis in several ways, making it more (a) expressive (stronger 
correctness guarantees are achieved with a modest annotation overhead), 
(b) effective (it produces more concise and easier-to-read programs), 
(c) efficient (faster synthesis), and (d) robust (synthesis efficiency is 
less affected by the choice of the search heuristic). We explain the in- 
tuition and provide formal treatment for read-only borrows. We sub- 
stantiate the claims (a)—(d) by describing our quantitative evaluation of 
the borrowing-aware synthesis implementation on a series of standard 
benchmark specifications for various heap-manipulating programs. 


1 Introduction 


Deductive program synthesis is a prominent approach to the generation of correct- 
by-construction programs from their declarative specifications [14, 23, 29, 33]. 
With this methodology, one can represent searching for a program satisfying the 
user-provided constraints as a proof search in a certain logic. Following this idea, 
it has been recently observed [34] that the synthesis of correct-by-construction 
imperative heap-manipulating programs (in a language similar to C) can be im- 
plemented as a proof search in a version of Separation Logic (SL)—a program 
logic designed for modular verification of programs with pointers [32,37]. 

SL-based deductive program synthesis based on Synthetic Separation Logic 
(SSL) [34] requires the programmer to provide a Hoare-style specification for a 
program of interest. For instance, given the predicate Is(x,S), which denotes a 
symbolic heap corresponding to a linked list starting at a pointer x, ending with 
null, and containing elements from the set S, one can specify the behaviour of 
the procedure for copying a linked list as follows: 


{r > xx ls(x,S)} listcopy(r) {r > y * Is(x, S) « Is(y, S)} (1) 


* Work done during an internship at NUS School of Computing in Summer 2019. 
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Is(nxt, S’) 
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The precondition of specification (1), defining the shape of the initial heap, 
is illustrated by the figure above. It requires the heap to contain a pointer r, 
which is taken by the procedure as an argument and whose stored value, x, is the 
head pointer of the list to be copied. The list itself is described by the symbolic 
heap predicate instance Is(x, S$), whose footprint is assumed to be disjoint from 
the entry r > x, following the standard semantics of the separating conjunction 
operator (x) [32]. The postcondition asserts that the final heap, in addition to 
containing the original list Is(x, S), will contain a new list starting from y whose 
contents S are the same as of the original list, and also that the pointer r will now 
point to the head y of the list copy. Our specification is incomplete: it allows, for 
example, duplicating or rearranging elements. One hopes that such a program 
is unlikely to be synthesised. In synthesis, it is common to provide incomplete 
specs: writing complete ones can be as hard as writing the program itself. 


1.1 Correct Programs that Do Strange Things 


Provided the definition of the heap predi- 


1 void listcopy (loc r) { 
cate Is and the specification (1), the SUS- 2 let x = *r; 
Lik tool, an implementation of the SSL- 3 if (x == 0) { 
based synthesis [34], will produce the pro- 4 } else { 

: ; i ; 5 let v = *x; 

gram depicted in Fig. 1. It is easy to check 2 let: nak = + $434 
that this program satisfies the ascribed 7 *r = nxt; 
spec (1). Moreover, it correctly duplicates 8 listcopy(r); 
the original list, faithfully preserving its 9 let yl = *r; 
contents and the ordering. However, an 1° let y = malloc (2); 
astute reader might notice a certain odd- F A = yt; 
ity in the way it treats the initial list pro- 43 *(y + 1) = ont; 
vided for copying. According to the post- 14 *y = vi 
condition of (1), the value of the pointer 15 +} 


r stored in a local immutable variable y1 
on line 9 is the head of the copy of the 
original list’s tail. Quite unexpectedly, the 
pointer y1 becomes the tail of the original 
list on line 11, while the original list’s tail 
pointer nxt, once assigned to *(y + 1) on 
line 13, becomes the tail of the copy! 
Indeed, the exercise in tail swapping is 
totally pointless: not only does it produces 
less “natural” and readable code, but the 
resulting program’s locality properties are unsatisfactory; for instance, this pro- 


ee 
‘le a nxt nxt + | y 


BG UE- 


Is(y, — f° 
Fig. 1: Result program for spec (1) 
and the shape of its final heap. 
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gram cannot be plugged into a concurrent setting where multiple threads rely 
on Is(x,S) to be unchanged. 

The issue with the result in Fig. 1 is caused by specification (1) being too 
permissive: it does not prevent the synthesised program from modifying the 
structure of the initial list, while creating its copy. Luckily, the SL community has 
devised a number of SL extensions that allow one to impose such restrictions, like 
declaring a part of the provided symbolic heap as read-only [5,8,9, 11,15, 20, 21], 
i.e., forbidden to modify by the specified code. 


1.2 Towards Simple Read-Only Specifications for Synthesis 


The main challenge of introducing read-only annotations (commonly also re- 
ferred to as permissions)? into Separation Logic lies in establishing the disci- 
pline for performing sound accounting in the presence of mixed read-only and 
mutating heap accesses by different components of a program. 

As an example, consider a simple symbolic heap 4 x Stars h} that declares 
two mutable (i.e., allowed to be written to) pointers x and r, that point to 


unspecified values f and h, correspondingly. With this symbolic heap, is it safe 
to call the following function that modifies the contents of r but not of x? 


{xSeerSn} readX(x, r) {xSeerSe} (2) 


The precondition of readX requires a weaker form of access permission for x 
(read-only, RO), while the considered heap asserts a stronger write permission 
(M). It should be possible to satisfy readX’s requirement by providing the nec- 
essary read-only permission for x. To do so, we need to agree on a discipline to 
“adapt” the caller’s write-permission M to the callee’s read-only permission RO. 
While seemingly trivial, if implemented naively, accounting of RO permissions in 
SL might compromise either soundness or completeness of the logical reasoning. 

A number of proposals for logically sound interplay between write- and read- 
only access permissions in the presence of function calls has been described in 
the literature [7—9, 11, 13, 20,30]. Some of these works manage to maintain the 
simplicity of having only mutable/read-only annotations when confined to the 
sequential setting [9, 11,13]. More general (but harder to implement) approaches 
rely on fractional permissions [8,25], an expressive mechanism for permission ac- 
counting, with primary applications in concurrent reasoning [7,28]. We started 
this project by attempting to adapt some of those logics [9,11,13] as an extension 
of SSL in order to reap the benefits of read-only annotations for the synthesis 
of sequential program. The main obstacle we encountered involved definitions 
of inductive heap predicates with mized permissions. For instance, how can one 
specify a program that modifies the contents of a linked list, but not its struc- 
ture? Even though it seemed possible to enable this treatment of predicates via 
permission multiplication [25], developing support for this machinery on top of 
existing SUSLIK infrastructure was a daunting task. Therefore, we had to look 
for a technically simpler solution. 


5 We will be using the words “annotation” and “permission” interchangeably. 
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1.3 Our Contributions 


Theoretical Contributions. Our main conceptual innovation is the idea of in- 
strumenting SSL with symbolic read-only borrows to enable faster and more 
predictable program synthesis. Borrows are used to annotate symbolic heaps 
in specifications, similarly to abstract fractional permissions from the deductive 
verification tools, such as CHALICE and VERIFAST [20,21,27]. They enable simple 
but principled lightweight threading of heap access permissions from the callers 
to callees and back, while enforcing read-only access whenever it is required. For 
basic intuition on read-only borrows, consider the specification below: 

{xt sy gerrtal readXY(x, y, r) {xe xyHgures (f +g)} (3) 
The precondition requires a heap with three pointers, x, y, and r, pointing to 
unspecified f, g, and h, correspondingly. Both x and y are going to be treated as 
read-only, but now, instead of simply annotating them with RO, we add symbolic 
borrowing annotations a and b. The semantics of these borrowing annotations 
is the same as that of other ghost variables (such as f). In particular, the callee 
must behave correctly for any valuation of a and b, which leaves it no choice 
but to treat the corresponding heap fragments as read-only (hence preventing 
the heap fragments from being written). On the other hand, from the perspec- 
tive of the caller, they serve as formal parameters that are substituted with 
actuals of caller’s choosing: for instance, when invoked with a caller’s symbolic 


heap {x $ 1 xy 2x r50} (where c denotes a read-only borrow of the caller), 


readXY is guaranteed to “restore” the same access permissions in the postcondi- 


tion, as per the substitution [M/a,c/b]. The example above demonstrates that 
read-only borrows are straightforward to compose when reasoning about code 
with function calls. They also make it possible to define borrow-polymorphic 
inductive heap predicates, e.g., enhancing Is from spec (1) so it can be used in 
specifications with mixed access permissions on their components.® Finally, read- 
only borrows make it almost trivial to adapt the existing SSL-based synthesis 
to work with read-only access permissions; they reduce the complex permission 
accounting to easy-to-implement permission substitution. 

Practical Contributions. Our first practical contribution is ROBoSuSLIk—an 
enhancement of the SUSLIK synthesis tool [34] with support for read-only bor- 
rows, which required us to modify less than 100 lines of the original code. 

Our second practical contribution is the extensive evaluation of synthesis with 
read-only permissions, on a standard benchmark suite of specifications for heap- 
manipulating programs. We compare the behaviour, performance, and the out- 
comes of the synthesis when run with the standard (“all-mutable” ) specifications 
and their analogues instrumented with read-only permissions wherever reason- 
able. By doing so, we substantiate the following claims regarding the practical 
impact of using read-only borrows in SSL specifications: 


— First, we show that synthesis of read-only specifications is more efficient: it 
does less backtracking while searching for a program that satisfies the imposed 
constraints, entailing better performance. 


6 We will present borrow-polymorphic inductive heap predicates in Sec. 2.4. 


Concise Read-Only Specifications for Better Synthesis 145 


— Second, we demonstrate that borrowing-aware synthesis is more effective: 
specifications with read-only annotations lead to more concise and human- 
readable programs, which do not perform redundant operations. 

— Third, we observe that read-only borrows increase expressivity of the synthe- 
sis: in most of the cases enhanced specifications provide stronger correctness 
guarantees for the results, at almost no additional annotation overhead. 

— Finally, we show that read-only borrows make the synthesis more robust: its 
results and performance are less likely to be affected by the unification order 
or the order of the attempted rule applications during the search. 


Paper Outline. We start by showcasing the intricacies and the virtues of SSL- 
based synthesis with read-only specifications in Sec. 2. We provide the formal 
account of read-only borrows and present the modified SSL rules, along with 
the soundness argument in Sec. 3. We report on the implementation and evalu- 
ation of the enhanced synthesis in Sec. 4. We conclude with a discussion on the 
limitations of read-only borrows in Sec. 5 and compare to related work in Sec. 6. 


2 Program Synthesis with Read-Only Borrows 


We introduce the enhancement of SSL with read-only borrows by walking the 
reader through a series of small but characteristic examples of deductive syn- 
thesis with separation logic. We provide the necessary background on SSL in 
Sec. 2.1; the readers familiar with the logic may want to skip to Sec. 2.2. 


2.1 Basics of SSL-based Deductive Program Synthesis 


In a deductive Separation Logic-based synthesis, a client provides a specifica- 
tion of a function of interest as a pair of pre- and post-conditions, such as 
{P} void foo(loc x, int i) {Q}. The precondition P constrains the symbolic 
state necessary to run the function safely (i.e., without crashes), while the post- 
condition Q constrains the resulting state at the end of the function’s execution. 
A function body c satisfying the provided specification is obtained as a result of 
deriving the SSL statement, representing the synthesis goal: 


{x, i}; {P}~{Qhle 

In the statement above, x and i are program variables, and they are explicitly 
stated in the environment I = {x,i}. Variables that appear in {P} and that are 
not program variables are called (logical) ghost variables, while the non-program 
variables that only appear in {Q} are referred to as (logical) existential ones (EV). 
The meaning of the statement I’; {P}~»{Q}|c is the validity of the Hoare-style 
triple {P} c {Q} for all possible values of variables from I.’ Both pre- and 
postcondition contain a spatial part describing the shape of the symbolic state 
(spatial formulae are ranged over via P, Q, and R), and a pure part (ranged over 
via $, wv, and €), which states the relations between variables (both program 
and logical). A derivation of an SSL statement is conducted by applying logical 


T We often care only about the existence of a program c to be synthesised, not its 
specific shape. In those cases we will be using a shorter statement: I’; {P} ~ {Q}. 
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rules, which reduce the initial goal to a trivial one, so it can be solved by one of 
the terminal rules, such as, e.g., the rule EMP shown below: 


E ro>y 
T; {;emp}~> {w; emp}| skip 


That is, EMP requires that (i) symbolic heaps in both pre- and post-conditions 
are empty and (ii) that the pure part ¢ of the precondition implies the pure 
part w of the postcondition. As the result, EMP “emits” a trivial program skip. 
Some of the SSL rules are aimed at simplifying the goal, bringing it to the shape 
that can be solved with EMP. For instance, consider the following rules: 


EM 


FRAME UNIFYHEAPS 

EV (T, P, Q) N Vars (R) = 0 [o]R' =R Ø 4 dom (o) C EV (T, P, Q) 
T; {¢4;P}~{Y;0}| c T; {¢;P *R}~[o]{Y;Q*R }| c 

T; {O;P +R} ~ {YQ *R}| c T; {GP +R} {Y QR} c 


Neither of the rules FRAME and UNIFYHEAPS “adds” to the program c being 
synthesised. However, FRAME reduces the goal by removing a matching part R 
(a.k.a. frame) from both the pre- and the post-condition. UNIFYHEAPS non- 
deterministically picks a substitution ø, which replaces existential variables in a 
sub-heap R’ of the postcondition to match the corresponding symbolic heap R in 
the precondition. Both of these rules make choices with regard to what frame R 
to remove or which substitution g to adopt—a point that will be of importance 
for the development described in Sec. 2.2. 

Finally, the following (simplified) rule for producing a write command is oper- 
ational, as it emits a part of the program to be synthesised, while also modifying 
the goal accordingly. The resulting program will, thus, consist of the emitted 
store *x = e of an expression e to the pointer variable x. The remainder is syn- 
thesised by solving the sub-goal produced by applying the WRITE rule. 


Vars (e) CT e £e I; {Ox e*xP}~{ẸYp;x m exQ}]c 


WRITE : 
T; {bx e xP} {Yx e e*Q}| *K=e5C 


As it is common with proof search, should no rule apply to an intermediate 
goal within one of the derivations, the deductive synthesis back-tracks, possibly 
discarding a partially synthesised program fragment, trying alternative deriva- 
tion branches. For instance, firing UNIFYHEAPS to unify wrong sub-heaps might 
lead the search down a path to an unsatisfiable goal, eventually making the 
synthesis back-track and leading to longer search. Consider also a misguided 
application of WRITE into a certain location, which can cause the synthesizer to 
generate a less intuitive program that “makes up” for the earlier spurious writes. 
This is precisely what we are going to fix by introducing read-only annotations. 


2.2 Reducing Non-Determinism with Read-Only Annotations 


Consider the following example adapted from the original SSL paper [34]. While 
the example is intentionally artificial, it captures a frequent synthesis scenario— 
non-determinism during synthesis. This specification allows a certain degree of 
freedom in how it can be satisfied: 


Concise Read-Only Specifications for Better Synthesis 147 


{x 239 x y> 30} void pick(loc x, loc y) {z < 100;x= zxy} z} (4) 

It seems logical for the synthesis to start the program derivation by applying 
the rule UNIFYHEAPS, thus reducing the initial goal to the one of the form 

{x,y}; {x> 239 * ys 30} ~ {239 < 100; x > 239 x y > 239} 

This new goal has been obtained by picking one particular substitution o = 
[239/z] (out of multiple possible ones), which delivers two identical heaplets of the 
form x ++ 239 in pre- and postcondition. It is time for the WRITE rule to strike to 
fix the discrepancy between the symbolic heap in the pre- and postcondition by 
emitting the command *y = 239 (at last, some executable code!), and resulting in 
the following new goal (notice the change of y-related entry in the precondition): 


{x,y} ; {x 239 * y œ 239} ~ {239 < 100; x + 239 * y+ 239} 


What follows are two applications of the FRAME rule to the common symbolic 

heaps, leading to the goal: {x, y} {emp} ~ {239 < 100;emp}. At this point, we 
are clearly in trouble. The pure part of the precondition is simply true, while the 
postcondition’s pure part is 239 < 100, which is unsolvable. 

Turns out that our initial pick of the substitution ø = [239/z] was an unfor- 
tunate one, and we should discard the series of rule applications that followed 
it, back-track and adopt a different substitution, e.g., 0’ = [30/z], which will 
indeed result in solving our initial goal. 

Let us now consider the same specification for pick that has been enhanced 
by explicitly annotating parts of the symbolic heap as mutable and read-only: 

{x 239 * v3.30} void pick(loc x, loc y) {z < 100; xŠ zx ys z} (5) 
In this version of SSL, the effect of rules such as EMP, FRAME, and UNIFYHEAPS 
remains the same, while operational rules such as WRITE, become annotation- 
aware. Specifically, the rule WRITE is now replaced by the following one: 


Vars (e) CT eže T; {px erP}o haea) c 

F: {ojxSe! «Phas fuixte * a} 

Notice how in the rule above the heaplets of the form xe are now anno- 

tated with the access permission M, which explicitly indicates that the code may 
modify the corresponding heap location. 

Following with the example specification (5), we can imagine a similar scenario 


when the rule UNIFYHEAPS picks the substitution ø = [239/z]. Should this be 
the case, the next application of the rule WRITERO will not be possible, due to 


the read-only annotation on the heaplet y#$ 239 in the resulting sub-goal: 


WRITERO 


*X = €; c 


{ay}: { wt 299. y% 30 \ ~ fz < 100; x$ 239x y*$ 239 


As the RO access permission prevents the synthesised code from modifying the 
greyed heaplets, the synthesis search is forced to back-track, picking an alterna- 
tive substitution o’=[30/z] and converging on the desirable program *x=30. 


8 One might argue that it was possible to detect the unsolvable conjunct 239 < 100 in 
the postcondition immediately after performing substitution, thus sparing the need 
to proceed with this derivation further. This is, indeed, a possibility, but in general 
it is hard to argue which of the heuristics in applying the rules will work better in 
general. We defer the quantitative argument on this matter until Sec. 4.4. 
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2.3 Composing Read-Only Borrows 


Having synthesised the pick function from specification (5), we would like to 
use it in future programs. For example, imagine that at some point, while syn- 
thesising another program, we see the following as an intermediate goal: 


{u,v}; {a 239 «v4 30«P} ~ fu < 200;u4wavdw eat (6) 


It is clear that, modulo the names of the variables, we can synthesise a part of 
the desired program by emitting a call pick(u,v), which we can then reduce to 
the goal {u, v} {P} ~ {w < 200;Q} via an application of FRAME. 

Why is emitting such a call to pick() safe? Intuitively, this can be done because 
the precondition of the spec (5) is weaker than the one in the goal (6). Indeed, the 
precondition of the latter provides the full (mutable) access permission on the 
heap portion v% 30, while the pre/postcondition of former requires a weaker 
form of access, namely read-only: yo 30. Therefore, our logical foundations 
should allow temporary “downgrading” of an access permission, e.g., from M to 
RO, for the sake of synthesising calls. While allowing this is straightforward and 
can be done similarly to up-casting a type in languages like Java, what turns out 
to be less trivial is making sure that the caller’s initial stronger access permission 
(M) is restored once pick(u, v) returns. 

Non-solutions. Perhaps, the simplest way to allow the call to a function with a 
weaker (in terms of access permissions) specification, would be to (a) downgrade 
the caller’s permissions on the corresponding heap fragments to RO, and (b) 
recover the permissions as per the callee’s specification. This approach signif- 
icantly reduces the expressivity of the logic (and, as a consequence, complete- 
ness of the synthesis). For instance, adopting this strategy for using specifica- 
tion (5) in the goal (6) would result in the unsolvable sub-goal of the form 
{u,v}; {urs 30 * v£ 30 xP} ~ fas 30 * v$ 30 * a}. This is due to the fact that 


the postcondition requires the heaplet v 30 to have the write-permission M, 
while the new precondition only provides the RO-access. 

Another way to cater for a weaker callee’s specification would be to “chip 
out” a RO-permission from a caller’s M-annotation (in the spirit of fractional 
permissions), offer it to the callee, and then “merge” it back to the caller’s full- 
blown permission upon return. This solution works for simple examples, but not 
for heap predicates with mixed permissions (discussion in Sec. 6). Yet another 
approach would be to create a “RO clone” of the caller’s M-annotation, introduc- 
ing an axiom of the form x St d+ xtxo t. The created component xt 
could be provided to the callee and discarded upon return since the caller re- 
tained the full permission of the original heap. Several works on RO permissions 
have adopted this approach [9, 11,13]. While discarding such clones works just 
fine for sequential program verification, in the case of synthesis guided by pre- 
and postconditions, incomplete postconditions could lead to intractable goals. 
Our solution. The key to gaining the necessary expressivity wrt. passing /return- 
ing access permissions, while maintaining a sound yet simple logic, is treating 
access permissions as first-class values. A natural consequence of this treatment 
is that immutability annotations can be symbolic (i.e., variables of a special sort 
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“permission” ), and the semantics of such variables is well understood; we refer to 
these symbolic annotations as read-only borrows.° For instance, using borrows, 
we can represent the specification (5) as an equivalent one: 


{a 239+ y> 30} void pick(loc x, loc y) {z < 100; zs yb} (7) 


The only substantial difference with spec (5) is that now the pointer y’s access 
permission is given an explicit name a. Such named annotations (a.k.a. borrows) 
are treated as RO by the callee, as long as the pure precondition does not con- 
strain them to be mutable. However, giving these permissions names achieves 
an important goal: performing accurate accounting while composing specifica- 
tions with different access permissions. Specifically, we can now emit a call to 
pick(u, v) as specified by (7) from the goal (6), keeping in mind the substitution 
a = [u/x,v/y,M/a]. This call now accounts for borrows as well, and makes it 
straightforward to restore v’s original permission M upon returning. 

Following the same idea, borrows can be naturally composed through capture- 
avoiding substitutions. For instance, the same specification (7) of pick could be 
used to advance the following modified version of the goal (6): 


{u,v}; {a 239 + v5 30x P} ~ {w< 210;urbwviswgh 
by means of taking the substitution o’ = [u/x, v/y, c/a]. 


2.4 Borrow-Polymorphic Inductive Predicates 


Separation Logic owes its glory to the extensive use of inductive heap predicates— 
a compact way to capture the shape and the properties of finite heap fragments 
corresponding to recursive linked data structures. Below we provide one of the 
most widely-used SL predicates, defining the shape of a heap containing a null- 
terminated singly-linked list with elements from a set S: 


Is(x,S) = x=0 A {S = ģ; emp} 
| x #0 A {S = {v} U S41; [x, 2] * x > v * (x, 1) œ nat * Is(nat,S1)} 


(8) 


The predicate contains two clauses describing the corresponding cases of the 
list’s shape depending on the value of the head pointer x. If x is zero, the list’s 
heap representation is empty, and so is the set of elements S. Alternatively, if x 
is not zero, it stores a record with two items (indicated by the block assertion 
[x, 2]), such that the payload pointer x contains the value v (where S = {v} U S4 
for some set S4), and the pointer, corresponding to x + 1 (denoted as (x, 1)) 
contains the address of the list’s tail, nzt. 

While expressive enough to specify and enable synthesis of various list-traversing 
and list-generating recursive functions via SSL, the definition (8) does not allow 
one to restrict the access permissions to different components of the list: all of 
the involved memory locations can be mutated (which explains the synthesis 
issue we described in Sec. 1.1). To remedy this weakness of the traditional SL- 
style predicates, we propose to parameterise them with read-only borrows, thus 
making them aware of different access permissions to their various components. 
For instance, we propose to redefine the linked list predicate as follows: 


° In this regard, our symbolic borrows are very similar to abstract fractional permis- 
sions in CHALICE and VERIFAST [21,27]. We discuss the relation in detail in Sec. 6. 
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Is(x,S,a,b,c) +x = 0 A {S = 0; emp} 

|z#Æ0A {s = {v} US1; [x, 2]° x x v * (x, 1) 4 nat * Is(nat, $1, a, b,c)} 

The new definition (9) is similar to the old one (8), but now, in addition to 
the standard predicate parameters (i.e., the head pointer x and the set S in this 
case), also features three borrow parameters a, b, and c that stand as place- 
holders for the access permissions to some particular components of the list. 
Specifically, the symbolic borrows b and c control the permissions to manipulate 
the pointers x and x+ 1, correspondingly. The borrow a, modifying a block- 
type heaplet, determines whether the record starting at x can be deallocated 
with free(x). All the three borrows are passed in the same configuration to the 
recursive instance of the predicate, thereby imposing the same constraints on 
the rest of the corresponding list components. 

Let us see the borrow-polymorphic inductive predicates in action. Consider 
the following specification that asks for a function taking a list of arbitrary values 
and replacing all of them with zeroes:!° 

{Is(x,S,d,M,e)} void reset(loc x) {ls(x,O, d,M, e)} (10) 

The spec (10) gives very little freedom to the function that would satisfy it 
with regard to permissions to manipulate the contents of the heap, constrained 
by the predicate Is(x,.S,d,M,e). As the first and the third borrow parameters are 
instantiated with read-only borrows (d and e), the desired function is not going 
to be able to change the structural pointers or deallocate parts of the list. The 
only allowed manipulation is, thus, changing the values of the payload pointers. 

This concise specification is pleasantly strong. To wit, in plain SSL, a similar 
spec (without read-only annotations) would also admit an implementation that 
fully deallocates the list or arbitrarily changes its length. In order to avoid these 
outcomes, one would, therefore, need to provide an alternative definition of the 
predicate Is, which would incorporate the length property too. 

Imagine now that one would like to use the implementation of reset satisfy- 
ing specification (10) to generate a function with the following spec, providing 
stronger access permissions for the list components: 

{Is(y,S,M,M,M)} void call_reset(loc y) {Is(y,O,M,M,M)} 
During the synthesis of call_reset, a call to reset is generated. For this 
purpose the access permissions are borrowed and recovered as per spec (10) via 
the substitution [y/x,M/d,M/e] in a way described in Sec. 2.3. 


2.5 Putting It All Together 


We conclude this overview by explaining how synthesis via SSL enhanced with 
read-only borrows avoids the issue with spurious writes outlined in Sec. 1.1. 

To begin, we change the specification to the following one, which makes use 
of the new list predicate (9) and prevents any modifications in the original list. 
{rx * Is(x, S, a, b, c)} listcopy(r) {ry * Is(x, S, a, b,c) * Is(y, S, M, M, m)} 
We should remark that, contrary to the solution sketched at the end of Sec. 1.1, 
which suggested using the predicate instance of the shape Is(x, S)[RO], our con- 
crete proposal does not allow us to constrain the entire predicate with a single 


10 We use O as a notation for a multi-set with an arbitrary finite number of zeroes. 
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Variable x,y Alpha-numeric identifiers 

Size, offset n,t Non-negative integers 

Expressione ::= 0|true|«|e=e|eAe|-7e 

Command c := let x = *(a+1)| *(~+ve) =e] let x =malloc(n) | free(x) 
| err | f) | c;c | if (e) {c} else {c} 

Fun. dict. A := e| A, f (Œi) {c} 


Fig. 2: Programming language grammar. 


Pure term $,U,xX,@:= 0] true|M|RO|2|d=¢|¢A¢|7¢ 


Symbolic heap P,Q,R = emp | (e, 1) Se | [e,4]® | pldi) | P*Q 

Heap predicate D = p(Ki) (ex, {Xx, Rx }) 

Function spec F == f(x): {PHQ} Assertion P,Q ::= {¢;P} 
Environment T := e |T, x Context = = e| E D| EF 


Fig. 3: BoSSL assertion syntax. 


access permission (e.g., RO). Instead, we allow fine-grained access control to 
its particular elementary components by annotating each one with an individ- 
ual borrow. The specification above allows the greatest flexibility wrt. access 
permissions to the original list by giving them different names (a, b, c). 

In the process of synthesising the non-trivial branch of listcopy, the search 
at some point will come up with the following intermediate goal: 

{x,r,nxt,v, y12}; 


{s = {v} U Sı;r$ y12 « [x, 2)? «xv (x,1) nxt *ls(y12, S1, M, M, M) te 


~ { |z, 2M x z$ v x (z, 1) Syt2 + B(12,S1, M, M, M) +} 


Since the logical variable z in the postcondition is an existential one, the greyed 

part of the symbolic heap can be satisfied by either (a) re-purposing the greyed 
part of the precondition (which is what the implementation in Sec. 1.1 does), or 
(b) allocating a corresponding record of two elements (as should be done). With 
the read-only borrows in place, the unification of the two greyed fragments in the 
pre- and postcondition via UNIFYHEAPS fails, because the mutable annotation 
of z$ v in the post cannot be matched by the read-only borrow x&v in the 
precondition. Therefore, not being able to follow the derivation path (a), the 
synthesiser is forced to explore an alternative one, eventually deriving the version 
of listcopy without tail-swapping. 


3 BoSSL: Borrowing Synthetic Separation Logic 


We now give a formal presentation of BoSSL—a version of SSL extended with 
read-only borrows. Fig. 2 and Fig. 3 present its programming and assertion lan- 
guage, respectively. For simplicity, we formalise a core language without theories 
(e.g., natural numbers), similar to the one of SMALLFOOT [6]; the only sorts in 
the core language are locations, booleans, and permissions (where permissions 
appear only in specifications) and the pure logic only has equality. In contrast, 
our implementation supports integers and sets (where the latter also only ap- 
pear in specifications), with linear arithmetic and standard set operations. We do 
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c 


Cr efe’ r; {05 (x,0) SeaPlos {us (xu) era} 
T; {9 (ee) Mel «Phas {us (xi) Hesg} 


* (x +4) =e;c 


ALLOC 
R= |z, n]|° * Ko<i<n ((z, i) Š e1) ({y} U {ti} N Vars (T, P, Q) =O zEEV (T, P, Q) 


R £ [y,n] * Kocicn (v i) He) LiT; {dP «RB J} {Y QR} c 


LT; {6;P}~ {v;Q* R}| let y = malloc(n); c 


R= [x, n]™ * Ko<icn (tx, i) 4 e:) Vars ({x} U {er} CT LT; {¢;P}~{Q}|c 
ZT; {9;P * R}~»{Q}| free(x);c 


Fig. 4: BoSSL derivation rules. 


not formalise sort-checking of formulae; however, for readability, we will use the 
meta-variable œ where the intended sort of the pure logic term is “permission” , 
and Perm for the set of all permissions. The permission to allocate or deallocate 
a memory-block [x,n]® is controlled by a. 


3.1 BoSSL rules 


New rules of BoSSL are shown in Fig. 4. The figure contains only 3 rules: this 
minimal adjustment is possible thanks to our approach to unification and permis- 
sion accounting from first principles. Writing to a memory location requires its 
corresponding symbolic heap to be annotated as mutable. Note that for a pre- 
condition {a = M; (x) +5}, a normalisation rule like SUBSTLEFT would first 
transform it into {Mm = M; (x) me Sh at which point the WRITE rule can be ap- 
plied. Note also that ALLOC does not require specific permissions on the block 
in the postcondition; if they turn out to be RO, the resulting goal is unsolvable. 

Unsurprisingly, the rule for accessing a memory cell just for reading purposes 
requires no adjustments since any permission allows reading. Moreover, the CALL 
rule for method invocation does not need adjustments either. Below, we describe 
how borrow and return seamlessly operate within a method call: 


CALL 
F = £(i) : {9r PtH Ys Qe} EE R = [o]Ps F o> olde ei = [o] 
Vars(@i) CT P Sole R [olle BT; {OA GP *R}~{Q}Ic 
Z; T; {b;P * R}~ {Q}| f (€); c 


The CALL rule fires when a sub-heap R in the precondition of the goal can 
be unified with the precondition P¢ of a function f from context E. Some salient 
points are worth mentioning here: (1) the annotation borrowing from R to Ps for 
those symbolic sub-heaps in Ps which require read-only permissions is handled 
by the unification of P; with R, namely R =[o]P, (i.e., substitution accounts for 
borrows: a/a); (2) the annotation recovery in the new precondition is implicit 
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via R’ £ [o]Q:, where the substitution ø was computed during the unification, 
that is, while borrowing; (3) finding a substitution o for R =[o]P¢ fails if R does 
not have sufficient accessibility permissions to call f (i.e., substitutions of the 
form a/M are disallowed since the domain of ø may only contain existentials). 
We reiterate that read-only specifications only manipulate symbolic borrows, 
that is to say, RO constants are not expected in the specification. 


3.2 Memory Model 
We closely follow the standard SL memory model [32,37] and assume Loc C Val. 


(Heap) h € Heaps ::= Loc > Val (Stack) s € Stacks ::= Var > Val 


To enable C-like accounting of dynamically-allocated memory blocks, we as- 
sume that the heap h also stores sizes of allocated blocks in dedicated locations. 
Conceptually, this part of the heap corresponds to the meta-data of the mem- 
ory allocator. This accounting ensures that only a previously allocated memory 
block can be disposed (as opposed to any set of allocated locations), enabling the 
free command to accept a single argument, the address of the block. To model 
this meta-data, we introduce a function bl: Loc + Loc, where b1(x) denotes 
the location in the heap where the block meta-data for the address x is stored, if 
x is the starting address of a block. In an actual language implementation, b1(x) 
might be, e.g., x — 1 (i.e., the meta-data is stored right before the block). 

Since we have opted for an unsophisticated permission mechanism, where the 
heap ownership is not divisible, but some heap locations are restricted to RO, 
the definition of the satisfaction relation eS for the annotated assertions in a 
particular context © and given an interpretation Z, is parameterised with a fixed 
set of read-only locations, R: 


— (h,s)FZ"{¢;emp} iff [¢], = true and dom (h) = 0. 

h,s)F7*{ 4; (e1,) S e2} iff [ls = true and 1 = [[e;],+e and dom (h) = {1} 
and h(1) = Je.J; and 1 ERGa=RO. 
,s)F3"{¢;[e, n]°} iff [¢]. = true and 14 b1([e];) and dom (n) = {1} and 
1) =n and 1 E R & a = RO. 
— (h, s)F}P{¢ġ; Pi * P2} iff Ihi, h2,h = h; Uha and (h;, s)F}?{¢; Pi} and 
(ha, 8)F7"{¢; Po}. 
(h, s)FP"{ ¢; p(vi)} iff]; = true and D = p(x) (ex, {Xx,Rx}) € E and 


h, [Wie ) € L(D) and Vy ((a, 8) FE" [aa] A ex A Xai Re) 


There are two non-standard cases: points-to and block, whose permissions 
must agree with R. Note that in the definition of satisfaction, we only need to 
consider that case where the permission a is a value (i.e., either RO or M). 
Although in a specification a can also be a variable, well-formedness guarantees 
that this variable must be logical, and hence will be substituted away in the 
definition of validity. We stress the fact that a reference that has RO permissions 
to a certain symbolic heap still retains the full ownership of that heap, with the 
restriction that it is not allowed to update or deallocate it. Note that deallocation 
additionally requires a mutable permission for the enclosing block. 


| 
S 
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3.3 Soundness 


The BoSSL operational semantics is in the spirit of the traditional SL [38], and 
hence is omitted for the sake of saving space (selected rules are available in 
the extended version of the paper). The validity definition and the soundness 
proofs of SSL are ported to BoSSL without any modifications, since our current 
definition of satisfaction implies the one defined for SSL: 


Definition 1 (Validity). We say that a well-formed Hoare-style specification 
LT; {P} c {Q} is valid wrt. the function dictionary A iff whenever dom (s) =T, 
Vog = [Xi > dilx,ecvir,p,g) such that (h, s)F3[ogulP; and A; (h,(c,s)-€) ~»* 
(h’, (skip, s') - €), it is also the case that (h’,s')FZ[oev U Ogu] Q for some Cey = 
[yj > djly,cewr,P,g)- 


The following theorem guarantees that, given a program c generated with 
BoSSL, a heap model, and a set of read-only locations R that satisfy the pro- 
gram’s precondition, executing c does not change those read-only locations: 


Theorem 1 (RO Heaps Do Not Change). Given a Hoare-style specification 
L; T; {¢;P}c{Q}, which is valid wrt. the function dictionary A, and a set of read- 
only memory locations R, if: 
(i) (h,s)EZ"[o]P, for some h,s and o, and 
(ii) A; (h, (c,s) + €) ~* (h’, (c',s') - €) for some h’,s’ and c' 
(iii) R C dom (h) 
then R C dom (h’) and V1 E€ R, h(1) =h’(1). 


Starting from an abstract state where a spatial heap has a read-only permis- 
sion, under no circumstance can this permission be strengthened to M: 


Corollary 1 (No Permission Strengthening). Given a valid Hoare-style 
specification £;T;{¢;P} c {w;Q} and a permission a, if y = (a = M) then 
it is also the case that ọ => (a = M). 


As it turns out, permission weakening is possible, since, though problematic, 
postcondition weakening is sound in general. However, even though this affects 
completeness, it does not affect our termination results. For example, given a 
synthesised auxiliary function F * f(x,r) : {xt «rey xb{xSe erst 1}, 


and a synthesis goal £, F;T; {x Tey xhns {xis Tey z}| c, firing the CALL 
rule for the candidate function f(x, r) would lead to the unsolvable goal £, F;T; 
{x7 cyabnsfathra yz} 
new goal since the permission of reference x in the goal’s precondition has been 


permanently weakened. To eliminate such sources of incompleteness we require 
the user-provided predicates and specifications to be well-formed: 


f(x,y);c. FRAME may never be fired on this 


Definition 2 (Well-Formedness of Spatial Predicates). We say that a 
spatial predicate p(Xi) (ex, {Xx, Rx}),e1,.y 28 well-formed iff 
(Uk (Vars (ex) U Vars (Xx) U Vars (Ry)) N Perm) C (Z7 N Perm). 


Concise Read-Only Specifications for Better Synthesis 155 


That is, every accessibility annotation within the predicate’s clause is bound by 
the predicate’s parameters. 


Definition 3 (Well-Formedness of Specifications). We say that a Hoare- 
style specification X; I; {P} c {Q} is well-formed iff EV (T, P, Q) Perm = ) and 


every predicate instance in P and Q is an instance of a well-formed predicate. 


That is, postconditions are not allowed to have existential accessibility annota- 
tions in order to avoid permanent weakening of accessibility. 

A callee that requires borrows for a symbolic heap always returns back to the 
caller its original permission for that respective symbolic heap: 


Corollary 2 (Borrows Always Return). A heaplet with permission a, either 
(a) retains the same permission a after a call to a function that is decorated with 
well-formed specifications and that requires for that heaplet to have read-only 
permission, or (b) it may be deallocated in case if a = M. 


4 Implementation and Evaluation 


We implemented BoSSL in an enhanced version of the SUSLIK tool, which 

we refer to as ROBoSuSLik [12].'! The changes to the original SUSLIK in- 

frastructure affected less than 100 lines of code. The extended synthesis is 
backwards-compatible with the original benchmarks. To make this possible, we 
treat the original SSL specifications as annotated/instantiated with M permis- 
sions, whenever necessary, which is consistent with treatment of access permis- 
sions in BoSSL. 

We have conducted an extensive experimental evaluation of ROBOSUSLIK, 
aiming to answer the following research questions: 

1. Do borrowing annotations improve the performance of SSL-based synthesis 
when using standard search strategy [34, § 5.2]? 

2. Do read-only borrows improve the quality of synthesised programs, in terms of 
size and comprehensibility, wrt. to their counterparts obtained from regular, 
“all-mutable” specifications? 

3. Do we obtain stronger correctness guarantees for the programs from the stan- 
dard SSL benchmark suite [34, § 6.1] by simply adding, whenever reasonable, 
read-only annotations to their specifications? 

4. Do borrowing specifications enable more robust synthesis? That is, should we 
expect to obtain better programs/synthesis performance on average regardless 
of the adopted unification and search strategies? 


4.1 Experimental Setup 


Benchmark Suite. To tackle the above research questions, we have adopted most 
of the heap-manipulating benchmarks from SUSLIK suite [34, § 6.1] (with some 
variations) into our sets of experiments. In particular we looked at the group 
of benchmarks which manipulate singly linked list segments, sorted linked list 
segments and binary trees. We did not include the benchmarks concerning binary 
search trees (BSTs) for the reasons outlined in the next paragraph. 


11 The sources are available at https: //github.com/TyGuS/robosus1ik. 
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The Tools. For a fair comparison which accounts for the latest advancements 
to SUSLIK, we chose to parameterise the synthesis process with a flag that 
turns the read-only annotations on and off (off means that they are set to be 
mutable). Those values which are the result of having this flag set will be marked 
in the experiments with RO, while those marked with Mut ignore the read-only 
annotations during the synthesis process. For simplicity, we will refer to the two 
instances of the tool, namely RO and Mut, as two different tools. Each tool was 
set to timeout after 2 minutes of attempting to synthesise a program. 


Criteria. In an attempt to quantify our results, we have looked at the size of 
the synthesised program (AST size), the absolute time needed to synthesise the 
code given its specification, averaged over several runs (Time), the number of 
backtrackings in the proof search due to nondeterminism (#Backtr), the total 
number of rule applications that the synthesis fired during the search (#Rules), 
including those that lead to unsolvable goals, and the strength of the guarantees 
offered by the specifications (Stronger Guarantees). 


Variables. Some benchmarks have shown improvement over the synthesis pro- 
cess without the read-only annotations. To emphasise the fact that read-only 
annotations’ improvements are not accidental, we have varied the inductive defi- 
nitions of the corresponding benchmarks to experiment with different properties 
of the underlying structure: the shape of the structure (in all the definitions), 
the length of the structure (for those benchmarks tagged with len), the values 
stored within the structure (val), a combination of all these properties (all) as 
well as with the sortedness property for the “Sorted list” group of benchmarks. 


Experiment Schema. To measure the performance and the quality of the borrowing- 
aware synthesis we ran the benchmarks against the two different tools and did 
a one-to-one comparison of the results. We ran each tool three times for each 
benchmark, and average the resulted synthesis time. All the other evaluation 
criteria remain constant within all three runs. 

To measure the tools’ robustness we stressed the synthesis algorithm by alter- 
ing the default proof search strategy. We prepared 42 such perturbations which 
we used to run against the different program variants enumerated above. Each 
pair of program variant and proof strategy perturbation has been then analysed 
to measure the number of rules that had been fired by RO and Mut. 


Hardware Setup. The experiments were conducted on a 64-bit machine running 
Ubuntu, with an Intel Xeon CPU (6 cores, 2.40GHz) with 32GB RAM. 


4.2 Performance and Quality of the Borrowing-Aware Synthesis 


Tab. 1 captures the results of running RO and Mut against the considered bench- 
marks. It provides the empirical proof that the borrowing-aware synthesis im- 
proves the performance of the original SSL-based synthesis, or in other words, 
answering positively the Research Question 1. RO suffers almost no loss in per- 
formance (except for a few cases, such as the list segment append where there 
is a negligible increase in time), while the gain is considerable for those synthe- 
sis problems with complex pointer manipulation. For example, if we consider 
the number of fired rules as the performance measurement criteria, in the worst 
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Gioi Déserintion AST size Time (sec) #Backtr. # Rules Stronger 

P P RO Mut | RO Mut Mut/RO| RO Mut Mut/RO| RO Mut Mut/RO| Guarant. 
append 20 20 E 8 8 Ox |77 78 1.0x YES 
delete 44 44 [1.9 2.1 vi 67 67 -Ox |180 180  1.0x same 
dispose 11 11 (0.5 0.5 -Ox 0 0 -Ox 8 8 1.0x same 
Linked init 13 13 |0.7 0.7 .0x 5 5 Ox |27 27 1.0x YES 
List lcopy 32 35| 1.0 1.0 -Ox 9 14 1.5x 66 82 1.2x YES 
Segment length 22 22 11.5 1.5 -Ox 2 2 -Ox 38 38 1.0x YES 
max 28 28 (1.4 1.5 ub, 2 2 Ox | 38 38 1.0x YES 
min 28 28 41.5 1.5 „0x 2 2 .0x 38 38 1.0x YES 
singleton 11 11 (0.5 0.5 -Ox 8 8 -Ox 30 30 1.0x same 
ins-sort-all | 29 29 [3.7 3.8 Ox 5 5 .0x | 60 60 1.0x YES 
Sorted ins-sort-len | 29 29 |3.0 3.0 .0x T 8 l.lx | 59 60 1.0x YES 
List ins-sort-val | 29 29 |2.6 2.5 .0x 5 5 0x |57 57 1.0x YES 
insert 53 53 |7.8 8.0 .0x Bomo 27x |214 338 1.6x YES 
prepend 11 11 (Sor 2s 1 1 -Ox 17 17 1.0x YES 
dispose 16 16 |04 0.5 1.2x 0 0 -Ox 10 10 1.0x same 
flatten-acc | 35 35 |2.1 2.0 -Ox 24 24 -Ox |118 118 1.0x same 
flatten-app | 48 48 |1.6 1.7 -Ox 14 14 -Ox 76 76 1.0x same 
morph 19 19 {0.6 0.5 -Ox al 1 -Ox 24 24 1.0x YES 
tcopy-all AD 5) | 26" 22" 15" || 1088 S.8x | 85° 206° 3.5% YES 
tcopy-len 36 42 |1.3 2.0 1.5x 6 90 15x 72 304 4.2x YES 
Tree tcopy-val 42 51 |1.4 5.3 3.8x 10 1222 122x | 82 2673 32x YES 
tcopy-ptr-all| 46 55 |1.6 2.4 1.5x | 10 88 8.8x | 93 303 3.3x YES 
tcopy-ptr-len] 40 46 {1.3 2.2 1.T7x 6 90 15x 80 311 3.9x YES 
tcopy-ptr-val] 46 55 |1.3 5.8 4.5x | 10 1222 122x | 89 2679 30x YES 
tsize-all 32 38 /1.5 14 0.9x 2 4 2.0x | 45 51 Liz YES 
tsize-len 32 32 [i noes 2 2 1.0x |44 46 1.0x YES 
tsize-ptr-all | 36 42 |1.6 1.4 0.9x 2 4 2.0x |53 58 ea YES 
tsize-ptr-len | 36 36 |1.3 1.3 1.0x 2 2 1.0x 52 53 Loz YES 


Table 1: Benchmarks and comparison between the results for synthesis with read- 
only annotations (RO) and without them (Mut). For each case study we measure 
the AST size of the synthesised program, the Time needed to synthesize the 
benchmark, the number of times that the synthesiser had to discard a derivation 
branch (#Backtr.), and the total number of fired rules (#Rules). 


case, RO behaves the same as Mut, while in the best scenario it buys us a 32-fold 
decrease in the number of applied rules. At the same time, synthesising a few 
small examples in the RO case is a bit slower, despite the same or smaller num- 
ber of rule applications. This is due to the increased number of logical variables 
(because of added borrows) when discharging obligations via SMT solver. 

Fig. 5 offers a statistical view of the numbers in the table, where smaller bars 
mark a better performance. The barplots indicate that as the complexity of the 
problem increases (approximately from left to right), RO outperforms Mut. 

Perhaps the most important take-away from this experiment is that the syn- 
thesis with read-only borrows often produces a more concise program (light green 
cells in the columnt AST size of Tab. 1), while retaining the same or better per- 
formance wrt. all the evaluated criteria. For instance, RO gets rid of the spurious 
write from the motivating example introduced in Sec. 1, reducing the AST size 
from 35 nodes down to 32, while in the same time firing fewer rules. That also 
means that we secure a positive answer for Research Question 2. 


4.3 Stronger Correctness Guarantees 


To answer Research Question 3, we have manually compared the guarantees 
offered by the specifications annotated with RO permissions against the default 
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Fig. 5: Statistics for synthesis with and without Read-Only specifications. 


ones - the results are summarized in the last column of Tab. 1. For instance, a 
specification stating that the shape of a linked-list segment is read-only implies 
that the size of that segment remains constant through the program’s execution. 
In other words, the length property need not be captured separately in the 
segment’s definition. If, in addition to the shape, the payload of the segment is 
also read-only, then the set of values and their ordering are also invariant. 

Consider the goal {lseg(x, y,s,a1,a2,a3)} ~ {lseg(x, y, S,a1,a2,a3)}, where 
Iseg is an inductive definition of a list segment which ends at y and contains 
the set of values s. The borrowing-aware synthesiser will produce a program 
which is guaranteed to treat the segment pointed by x and ending with y as 
read-only (that is, its shape, length, values and orderings are invariant). At the 
same time, for a goal {Iseg(x,y,s)} ~ {lseg(x,y,s)} , the guarantees are that 
the returned segment still ends in y and contains values s. Internal modifications 
of the segment, such as reordering and duplicating list elements, may still occur. 

The few entries marked with same are programs with specifications which have 
not got stronger when instrumented with RO annotations (e.g., delete). These 
benchmarks require mutation over the entire data structure, hence the read-only 
annotations do not influence the offered guarantees. Overall, our observations 
that read-only annotations offer stronger guarantees are in agreement with the 
works on SL-based program verification [9,13], but are promoted here to the 
more challenging problem of program synthesis. 
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4.4 Robustness under Synthesis Perturbations 


There is no single search heuristics that will work equally well for any given 
specification: for a particular fixed search strategy, a synthesiser can exhibit 
suboptimal performance for some goals, while converging quickly on some others. 
By evaluating robustness wrt. to RO and M specification methodologies, we are 
hoping to show that, provided a large variety of “reasonable” search heuristics, 
read-only annotations deliver better synthesis performance “on average”. 

For this set of experiments, we have focused on four characteristic programs 
from our performance benchmarks based on their pointer manipulation com- 
plexity: list segment copy (1copy), insertion into a sorted list segment (insert), 
copying a tree (tcopy), and a variation of the tree copy that shares the same 
pointer for the input tree and its returned copy (tcopy-ptr). 


Exploring Different Unification Orders. Since spatial unification stays at the core 
of the synthesis process, we implemented 6 different strategies for choosing a 
unification candidate based on the following criteria: the size of the heaplet chunk 
(favor the smallest heap vs. the largest one as the best unification candidate), the 
name of the predicate (we considered both an ascending as well as a descending 
priority queue), and a customised ranking function which associates a cost to a 
symbolic heap based on its kind—a block is cheaper to unify than a points-to 
which in turn is cheaper than a spatial predicate. 


Exploring Different Search Strategies. We next designed 6 strategies for priori- 
tising the rule applications. One of the crux rules in this matter, is the WRITE 
rule whose different priority schemes might make all the results seem randomly- 
generated. In the cases where WRITE leads to unsolvable goals, one might right- 
fully argue that RO has a clear advantage over Mut (fail fast). However, for 
the cases where mutation leads to a solution faster, then Mut might have an 
advantage over RO (solve fast). Because these are just intuitive observations, 
and for fairness sake, we experimented with both the cases where WRITE has a 
high and a low priority in the queue of rule phases [34, § 5.2]. Since most of the 
benchmarks involve recursion, we also chose to shuffle around the priorities of 
the OPEN and CALL rules. Again, we chose between a stack high and a bottom 
low priority for these rules to give a fair chance to both tools. 

We considered all combinations of the 6 unification permutations and the 6 
rule-application permutations (plus the default one) to obtain 42 different proof 
search perturbations. We will use the following notation in the narrative below: 
— Sis the set comprising the synthesis problems: lcopy, insert, tcopy, tcopy-ptr. 
— Vis the set of all specification variations: len, val, all. 

— K is the set of all 42 possible tool perturbations. 

The distributions of the number of rules fired for each tool (RO and Mut) 
with the 42 perturbations over the 4 synthesis problems with 3 variants of spec- 
ification each, that is 1008 different synthesis runs, are summarised using the 
boxplots in Fig. 6. There is a boxplot corresponding to each pair of tool and 
synthesis problem. In the ideal case, each boxplot contains 126 data points cor- 
responding to a unique combination (v,k) of a specification variation v € V and 
a tool perturbation k € K. A boxplot is the distribution of such data based on a 
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Fig. 6: Boxplots of variations in log, (numbers of applied rules) for synthesis per- 
turbations. Numbers of data points for each example are given in parentheses. 


six number summary: minimum, first quartile, median, third quartile, maximum, 
outliers. For example, the boxplot for tcopy-ptr corresponding to RO and con- 
taining 90 data points, reads as follows: “the synthesis processes fired between 
64 and 256 rules, with most of the processes firing between 64 and 128 rules. 
There are three exception where the synthesiser fired more than 256 rules”. Note 
that the y-axis represents the binary logarithm of the number of fired rules. 

Even though we attempted to synthesise each program 126 times for each tool, 
some attempts hit the timeout and therefore their corresponding data points had 
to be eliminated from the boxplot. It is of note, though, that whenever RO with 
configuration (v,k) hit the timeout for the synthesis problem s € S, so did Mut, 
hence both the (RO,s,(v,k)) as well as (Mut,s,(v,k)) are omitted from the 
boxplots. But the inverse did not hold: RO hit the timeout fewer times than 
Mut, hence RO is measured at disadvantage (i.e., more data points means more 
opportunities to show worse results). Since insert collected the highest number 
of timeouts, we equalised it to remove non-matched entries across the two tools. 

Despite RO’s potential measurement disadvantage, the boxplots depicts it as a 
clear winner. Not only RO fires fewer rules in all the cases, but with the exception 
of insert, it is also more stable to the proof search perturbations, it varies a 
few order of magnitude less than Mut does for the same configurations. Fig. 7 
supports this observation by offering a more detailed view on the distributions 
of the numbers of fired rules per synthesis configuration. Taller bars show that 
more processes fall in the same range (wrt. the number of fired rules). For 1copy, 
tcopy, tcopy-ptr it is clear that Mut has a wider distribution of the number 
of fired rules, that is, Mut is more sensitive to the perturbations than RO. We 
additionally make some further observations: 
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Fig. 7: Distributions of log, (number of attempted rule applications). 


— Despite a similar distribution wrt. the numbers of fired rules in the case of 
insert, RO produces compact ASTs of size 53 for all perturbations, while Mut 
fluctuates between producing ASTs of size 53 and 62. 

— For all the synthesis tasks, RO produced the same AST irrespective of the 
tool’s perturbation. In contrast, there were synthesis problems for which Mut 
produced as many as 3 different ASTs for different perturbations, none of 
which were as concise as the one produced by RO for the same configuration. 

— The outliers of (Mut, lcopy) are ridiculously high, firing close to 40k rules. 

— The outliers of (RO, tcopy) are still below the median values of (Mut, tcopy). 

— Except for insert, the best performance of Mut, in terms of fired rules, barely 
overlaps with the worst performance of RO. 

— Except for insert, the medians of RO are closer to the lowest value of the 
data distribution, as opposed to Mut where the tendancy is to fire more rules. 

— In absolute values, RO hit the 2-minutes timeout 102 times compared to Mut, 
which hit the timeout 132 times. 


We believe that the main take-aways from this set of experiments, along with 

the positive answer to the Research Question 4, are as follows: 

— RO is more stable wrt. the number of rules fired and the size of the generated 
AST for many reasonable proof search perturbations. 

— RO produces better programs, which avoid spurious statements, irrespective 
of the perturbation and number of rules fired during the search. 


5 Limitations and Discussion 


Flexible aliasing. Separating conjunction asserts that the heap can be split into 
two disjoint parts, or in other words it carries an implicit non-aliasing infor- 
mation. Specifically, x H _* y }> _ states that x and y are non-aliased. Such 
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assertions can be used to specify methods as below: 
{x> n*y > m*ret > x} sum(x, y, ret) {x> n*y m*ret > n +m} 


Occasionally, enforcing x and y to be non-aliased is too restrictive, rejecting 
safe calls such as sum(p, p,q). Approaches to support immutable annotations 
permit such calls without compromising safety if both pointers, aliased or not, 
are annotated as read-only [9,13]. BoSSL does not support such flexible aliasing. 
Precondition strengthening. Let us assume that srtl(x, n, 1o, hi, a1,Q@2,Q3) is an 
inductive predicate that describes a sorted linked list of size n with lo and 
hi being the list’s minimum and maximum payload value, respectively. Now, 
consider the following synthesis goal: 

{x,y}; {y > x * srtl(x,n, 1o, hi, M, M, M)} ~ {y > n x srtl(x, n, 1o, hi, M, M, M) }. 

As stated, the goal clearly requires the program to compute the length n of 
the list. Imagine that we already have a function that does precisely that, even 
though it is stated in terms of a list predicate that does not enforce sortedness: 

{ret > x * ls(x,n, a1,a2,a3)} length(x, ret) {ret > n x Is(x,n,a1, a2, a3)} 

To solve the initial goal, the synthesiser could weaken the given precondition 
srtl(x,n, 10, hi, M, M, M) to Is(x,n, M, M, M), and then successfully synthesise a 
call to the length method. Unfortunately, the resulting goal, obtained after hav- 
ing emitted the call to length and applying FRAME, is unsolvable: 

{x,y} {Is(x,n, M, M, M)} ~ {srtl(x,n, 1o, hi, M, M, M) }. 

since the logic does not allow to strengthen an arbitrary linked list to a sorted 
linked list without retaining the prior knowledge. Should we have adopted an 
alternative approach to read-only annotations [9,13] allowing the caller to retain 
the full permission of the sorted list, then the postcondition of length would not 
contain the list-related part of the heap and would only quantify over the result 
pointer {ret > n}, thus leading to the solvable goal below: 

{x,y}; {srtI(x,n, 1o, hi, M, M, M)} ~ {srtl(x,n,1o,hi, M, M, M) }. 

One straightforward way for BoSSL to cope with this limitation is to simply 

add a version of length annotated with specifications that cater to srt1. 


Overcoming the limitations. While the “caller keeps the permission” kind of ap- 
proach would buy us flexible aliasing and calls with weaker specifications, it 
would compromise the benefits discussed earlier with respect to the granular- 
ity of borrow-polymorphic inductive predicates. One possible solution to gain 
the best of both worlds would be to design a permission system which allows 
both borrow-polymorphic inductive predicates as well as read-only modalities to 
co-exist, where the latter would overwrite the predicate’s mixed permissions. In 
other words, the read-only modality enforces a read-only treatment of the pred- 
icate irrespective of its permission arguments, while the permission arguments 
control the treatment of a mutable predicate. The theoretical implications of 
such a design choice are left as part of future work. 

Extending read-only specifications to concurrency. Thus far we have only inves- 
tigated the synthesis of sequential programs, for which read-only annotations 
helped to reduce the synthesis cost. Assuming that the synthesiser has the capa- 
bility to synthesise concurrent programs as well, the borrows annotation mecha- 
nism in its current form may not be able to cope with general resource sharing. 
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This is because a callee which requires read-only permissions to a particular 
symbolic heap still consumes the entire required symbolic heap from the caller, 
despite the read-only requirement; hence, there is no space left for sharing. That 
said, the recently proposed alternative approaches to introduce read-only an- 
notations [9,13] have no formal support for heap sharing in the presence of 
concurrency either. To address these challenges, we could adopt a more sophis- 
ticated approach based on fractional permissions mechanism [7,8, 20, 25,30], but 
this is left as part of future work since it is orthogonal to the current scope. 


6 Related Work 


Language design. There is a large body of work on integrating access permissions 
into practical type systems [5, 16,42] (see, e.g., the survey by Clarke et al. [10]). 
One notable such system, which is the closest in its spirit to our proposal, is 
the borrows type system of the Rust programming language [1] proved safe with 
RUsTBELT [22]. Similar to our approach, borrows in Rust are short-lived: in 
Rust they share the scope with the owner; in our approach they do not escape 
the scope of a method call. In contrast with our work, Rust’s type system care- 
fully manages different references to data by imposing strict sharing constraints, 
whereas in our approach the treatment of aliasing is taken care of automatically 
by building on Separation Logic. Moreover, Rust allows read-only borrows to be 
duplicated, while in the sequential setting of BoSSL this is currently not possible. 

Somewhat related to our approach, Naden et al. propose a mechanisms for 
borrowing permissions, albeit integrated as a fundamental part of a type sys- 
tem [31]. Their type system comes equipped with change permissions which 
enforce the borrowing requirements and describe the effects of the borrowing 
upon return. As a result of treating permissions as first-class values, we do not 
need to explicitly describe the flow of permissions for each borrow since this is 
controlled by a mix of the substitution and unification principles. 


Program verification with read-only permissions. Boyland introduced fractional 
permissions to statically reason about interference in the presence of shared- 
memory concurrency [8]. A permission p denotes full resource ownership (i.e. 
read-write access) when p = 1, while p € (0,1) denotes a partial ownership (i.e. 
read-only access). To leverage permissions in practice, a system must support 
two key operations: permission splitting and permission borrowing. Permission 
splitting (and merging back) follows the split rule: x a = x axx "$ a, with p = 
pı +p2 and p, p1, p2 € (0, 1]. Permission borrowing refers to the safe manipulation 
of permissions: a callee may remove some permissions from the caller, use them 
temporarily, and give them back upon return. 

Though it exists, tool support for fractional permissions is still scarce. Leino 
and Müller introduced a mechanism for storing fractional permissions in data 
structures via dedicated access predicates in the CHALICE verification tool [27]. 
To promote generic specifications, Heule et al. advanced CHALICE with insta- 
tiable abstract permissions, allowing automatic fire of the split rule and symbolic 
borrowing [20]. VERIFAST [21] is guided by contracts written in Separation Logic 
and assumes the existence of lemmas to cater for permission splitting. VIPER [30] 
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is an intermediate language which supports various permission models, includ- 
ing abstract fractional permissions [4,43]. Similar to CHALICE, the permissions 
are attached to memory locations using an accessibility predicate. To reason 
about it, VIPER uses permission-aware assertions and assumptions, which corre- 
spond in our approach to the unification and the substitution operations, respec- 
tively. Like VIPER, we enhance the basic memory constructors, that is blocks 
and points-to, to account for permissions, but in contrast, the CALL rule in our 
approach is standard, i.e., not permission-aware. 

These tools, along with others [3, 18], offer strong correctness guarantees in 
the presence of resource sharing. However, there is a class of problems, namely 
those involving predicates with mixed permissions, whose guarantees are weak- 
ened due to the general fractional permissions model behind these tools. We next 
exemplify this class of problems in a sequential setting. We start by considering 
a method which resets the values stored in a linked-list while maintaining its 
shape (p < 1 below is to enforce the immutable shape): 

{p < 1; Is(x,S)[1,p]} void reset (loc x) {Is(x, {0})[1, p]}. 

Assume a call to this method, namely reset(y). The caller has full permission 
over the entire list passed as argument, that is Is(y,B)[1,1]. This attempt leads 
to two issues. The first has to do with splitting the payload’s permission (before 
the call) such that it matches the callee’s postcondition. To be able to modify the 
list’s payload, the callee must get the payload’s full ownership, hence the caller 
should retain 0: Is(y, B)[1, 1] = Is(y, B)[0, 1/2] «Is(y, B)[1, 1/2]. But 0 is not a valid 
fractional permission. The second issue surfaces while attempting to merge the 
permissions after the call: Is(y, B)[0, 1/2] *ls(y, {0})[1, 1/2] is invalid since the two 
instances of 1s have incompatible arguments (namely B and {0}). To avoid such 
problems, BoSSL abandons the split rule and instead always manipulates full 
ownership of resources, hence it does not use fractions. This compromise, along 
with the support for symbolic borrows, allows ROBOSUSLIK to guarantee read- 
only-ness in a sequential setting while avoiding the aforementioned issues. More 
investigations are needed in order to lift this result to concurrency reasoning. 
Another feature which distinguishes the current work from those based on frac- 
tional permissions, is the support for permissions as parameters of the predicate, 
which in turn supports the definition of predicates with mixed permissions. 

Immutable specifications on top of Separation Logic have also been studied by 
David and Chin [13]. Unlike our approach which treats borrows as polymorphic 
variables that rely on the basic concept of substitution, their annotation mech- 
anism comprises only constants and requires a specially tailored entailment on 
top of enhanced proof rules. Since callers retain the heap ownership upon calling 
a method with read-only requirements, their machinery supports flexible aliasing 
and cut-point preservation—features that we could not find a good use for in the 
context of program synthesis. An attempt to extend David and Chin’s work by 
adding support for predicates with mixed permissions [11] suffers from significant 
annotation overhead. Specifically, it employs a mix of mutable, immutable, and 
absent permissions, so that each mutable heaplet in the precondition requires a 
corresponding matching heaplet annotated with absent in the postcondition. 
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Charguéraud and Pottier [9] extended Separation Logic with RO assertions 
that can be freely duplicated or discarded. Their approach creates lexically- 
scoped copies of the RO-permissions before emitting a call, which, in turn, in- 
volves discarding the corresponding heap from the postcondition to guarantee a 
sound RO-modality. Adapting this modality to program synthesis guided by pre- 
and postconditions would require a completely new system of deductive synthesis 
since most of the rules in SSL are not designed to handle the discardable RO- 
heaps. In contrast, BoSSL supports permission-parametric predicates (e.g., (9)) 
requiring only minimal adjustments to its host logic, i.e., SSL. 


Program synthesis. BoSSL continues a long line of work on program synthesis 
from formal specifications [26, 36,40, 41,44] and in particular, deductive synthe- 
sis {14, 23, 29, 33,34], which can be characterised as search in the space of proofs 
of program correctness (rather than in the space of programs). Most directly 
BoSSL builds upon our prior work on SSL [34] and enhances its specification 
language with read-only annotations. In that sense, the present work is also re- 
lated to various approaches that use non-functional specifications as input to 
synthesis. It is common to use syntactic non-functional specifications, such as 
grammars [2], sketches [36,40], or restrictions on the number of times a compo- 
nent can be used [19]. More recent work has explored semantic non-functional 
specifications, including type annotations for resource consumption [24] and se- 
curity/privacy [17,35,39]. This research direction is promising because (a) anno- 
tations often enable the programmer to express a strong specification concisely, 
and (b) checking annotations is often more compositional (i.e., fails faster) than 
checking functional specifications, which makes synthesis more efficient. In the 
present work we have demonstrated that both of these benefits of non-functional 
specifications also hold for the read-only annotations of BoSSL. 


7 Conclusion 


In this work, we have advanced the state of the art in program synthesis by 
highlighting the benefits of guiding the synthesis process with information about 
memory access permissions. We have designed the logic BoSSL and implemented 
the tool ROBoSuSLIk, showing that a minimalistic discipline for read-only per- 
missions already brings significant improvements wrt. the performance and ro- 
bustness of the synthesiser, as well as wrt. the quality of its generated programs. 
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Abstract. We propose a general proof technique to show that a pred- 
icate is sound, that is, prevents stuck computation, with respect to a 
big-step semantics. This result may look surprising, since in big-step se- 
mantics there is no difference between non-terminating and stuck com- 
putations, hence soundness cannot even be expressed. The key idea is 
to define constructions yielding an extended version of a given arbitrary 
big-step semantics, where the difference is made explicit. The extended 
semantics are exploited in the meta-theory, notably they are necessary 
to show that the proof technique works. However, they remain transpar- 
ent when using the proof technique, since it consists in checking three 
conditions on the original rules only, as we illustrate by several examples. 


1 Introduction 


The semantics of programming languages or software systems specifies, for each 
program/system configuration, its final result, if any. In the case of non-existence 
of a final result, there are two possibilities: 


— either the computation stops with no final result, and there is no means to 
compute further: stuck computation, 
— or the computation never stops: non-termination. 


There are two main styles to define operationally a semantic relation: the 
small-step style [34,35], on top of a reduction relation representing single com- 
putation steps, or directly by a set of rules as in the big-step style [28]. Within a 
small-step semantics it is straightforward to make the distinction between stuck 
and non-terminating computations, while a typical drawback of the big-step style 
is that they are not distinguished (no judgement is derived in both cases). 

For this reason, even though big-step semantics is generally more abstract, 
and sometimes more intuitive to design and therefore to debug and extend, in the 
literature much more effort has been devoted to study the meta-theory of small- 
step semantics, providing properties, and related proof techniques. Notably, the 
soundness of a type system (typing prevents stuck computation) can be proved 
by progress and subject reduction (also called type preservation) [40]. 

Our quest is then to provide a general proof technique to prove the soundness 
of a predicate with respect to an arbitrary big-step semantics. How can we 
achieve this result, given that in big-step formulation soundness cannot even 
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be expressed, since non-termination is modelled as the absence of a final result 
exactly like stuck computation? The key idea is the following: 


1. We define constructions yielding an extended version of a given arbitrary big- 
step semantics, where the difference between stuckness and non-termination 
is made explicit. In a sense, these constructions show that the distinction 
was “hidden” in the original semantics. 

2. We provide a general proof technique by identifying three sufficient condi- 
tions on the original big-step rules to prove soundness. 


Keypoint (2)’s three sufficient conditions are local preservation, 4-progress, 
and V-progress. For proving the result that the three conditions actually ensure 
soundness, the setting up of the extended semantics from the given one is nec- 
essary, since otherwise, as said above, we could not even express the property. 

However, the three conditions deal only with the original rules of the given 
big-step semantics. This means that, practically, in order to use the technique 
there is no need to deal with the extended semantics. This implies, in particular, 
that our approach does not increase the original number of rules. Moreover, the 
sufficient conditions are checked only on single rules, which makes explicit the 
proof fragments typically needed in a proof of soundness. Even though this is 
not exploited in this paper, this form of locality means modularity, in the sense 
that adding a new rule implies adding the corresponding proof fragment only. 

As an important by-product, in order to formally define and prove correct 
the keypoints (1) and (2), we propose a formalisation of “what is a big-step 
semantics” which captures its essential features. Moreover, we support our ap- 
proach by presenting several examples, demonstrating that: on the one hand, 
their soundness proof can be easily rephrased in terms of our technique, that 
is, by directly reasoning on big-step rules; on the other hand, our technique is 
essential when the property to be checked (for instance, the soundness of a type 
system) is not preserved by intermediate computation steps, whereas it holds 
for the final result. On a side note, our examples concern type systems, but the 
meta-theory we present in this work holds for any predicate. 

We describe now in more detail the constructions of keypoint (1). Starting 
from an arbitrary big-step judgment c= r that evaluates configurations c into 
results r, the first construction produces an enriched judgement c =+ t where t 
is a trace, that is, the (finite or infinite) sequence of all the (sub)configurations 
encountered during the evaluation. In this way, by interpreting coinductively the 
rules of the extended semantics, an infinite trace models divergence (whereas 
no result corresponds to stuck computation). The second construction is in a 
sense dual. It is the algorithmic version of the well-known technique presented 
in Exercise 3.5.16 from the book [33] of adding a special result wrong explicitly 
modelling stuck computations (whereas no result corresponds to divergence). 

By trace semantics and wrong semantics we can express two flavours of sound- 
ness, soundness-may and soundness-must, respectively, and show the correctness 
of the corresponding proof technique. This achieves our original aim, and it 
should be noted that we define soundness with respect to a big-step semantics 
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within a big-step formulation, without resorting to a small-step style (indeed, 
the two extended semantics are themselves big-step). 

Lastly, we consider the issue of justifying on a formal basis that the two 
constructions are correct with respect to their expected meaning. For instance, 
for the wrong semantics we would like to be sure that all the cases are covered. 
To this end, we define a third construction, dubbed PEV for “partial evalua- 
tion”, which makes explicit the computations of a big-step semantics, intended 
as the sequences of execution steps of the naturally associated evaluation algo- 
rithm. Formally, we obtain a reduction relation on approximated proof trees, 
so termination, non-termination and stuckness can be defined as usual. Then, 
the correctness of traces and wrong constructions is proved by showing they are 
equivalent to PEV for diverging and stuck computations, respectively. 

In Sect. 2 we illustrate the meta-theory on a running example. In Sect. 3 we 
define the trace and wrong constructions. In Sect. 4 we express soundness in the 
must and may flavours, introduce the proof technique, and prove its correctness. 
In Sect. 5 we show in detail how to apply the technique to the running example, 
and other significant examples. In Sect. 6 we introduce the third construction and 
state that the three constructions are equivalent. Finally, in 7 and 8 we discuss 
related and further work and summarise our contribution. An extended version 
including an additional example, proofs omitted for lack of space, and technical 
details on the PEV semantics, can be found at http://arxiv.org/abs/2002.08738. 


2 A meta-theory for big-step semantics 


We introduce a formalisation of “what is a big-step semantics” that captures its 
essential features, subsuming a large class of examples (as testified in Sect. 5). 
This enables a general formal reasoning on an arbitrary big-step semantics. 

A big-step semantics is a triple (C, R, R) where: 


— C is a set of configurations c. 

— RC C isa set of results r. We define judgments j = c= r, meaning that 
configuration c evaluates to result r. Set C(j) = c and R(j) = r. 

— R is a set of rules p of shape 

Ji --- Jn Jn+1 
c= R(jn+1) 

with c € C\R, where jı... jn are the dependencies and fn+1 is the continu- 
ation. Set C(p)=c and, for i € 1..n +1, C(p,i)=C (ji) and R(p,i)=R(ji). 

— For each result r € R, we implicitly assume a single axiom PE Hence, the 


also written in inline format: rule(ji . .. jn, Jn+1, C) 


only derivable judgment for r is r => r, which we will call a trivial judgment. 


We will use the inline format, more concise and manageable, for the development 
of the meta-theory, e.g., in constructions. 

A rule corresponds to the following evaluation process for a non-result con- 
figuration: first, dependencies are evaluated in the given order, then the contin- 
uation is evaluated and its result is returned as result of the entire computation. 
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e ::= q | v | e1 e2 | succe | e1 © e2 expression 
v = n | Axe value 
e& => àze e=>Ħv eļv/r]= v e>n 

(VAL) ( APP ) (SUDON —— 

v>v ey 2 > V succe>n+l1 

ei = V š 
(CHOICE) -e t= 1,2 
e ® e => v 


(app) rule(e1 => Àz.e e2 => v2, e[v2/xz] => v, e1 e2) 
(suco) rule(e=> n, n+1=>n +1, succe) 
(CHOICE) rule(e, ei= v, 6 e2) 7=1,2 


Fig. 1. Example of big-step semantics 


Rules as defined above specify an inference system [1,30], whose inductive 
interpretation is, as usual, the semantic relation. However, they carry slightly 
more structure with respect to standard inference rules. Notably, premises are 
a sequence rather than a set, and the last premise plays a special role. Such 
additional structure does not affect the semantic relation defined by the rules, 
but allows abstract reasoning about an arbitrary big-step semantics, in particular 
it is relevant for defining the three constructions. In the following, we will write 
RF c> r when the judgment c= r is derivable in R. 

As customary, the (infinite) set of rules R is described by a finite set of meta- 
rules, each one with a finite number of premises. As a consequence, the number of 
premises of rules is not only finite but bounded. Since we have no notion of meta- 
rule, we model this feature (relevant in the following) as an explicit assumption: 

BP there exists b € N such that, for each p = rule(j1..- jn, jnti; €); N < b. 
We end this section illustrating the above definitions and conditions by a simple 
example: a A-calculus with natural constants, successor and non-deterministic 
choice shown in Fig. 1. We present this example as an instance of our definition: 


— Configurations and results are expressions, and values, respectively.’ 
— To have the set of (meta-)rules in our required shape, abbreviated in inline 
format in the bottom section of the figure: 
e axiom (var) can be omitted (it is implicitly assumed) 
e in (arr) we consider premises as a sequence rather than a set (the third 
premise is the continuation) 
e in (succ), which has no continuation, we add a dummy continuation 
e on the contrary, in (cuoice) there is only the continuation (dependencies 
are the empty sequence, denoted e€ in the inline format). 


Note that (arp) corresponds to the standard left-to-right evaluation order. We 
could have chosen the right-to-left order instead: 

(app-r) rule(eg => v2 e1 => Aw.e , e[va/x] > v, e1 e2) 
or even opt for a non-deterministic approach by taking both rules (arp) and 


3 In general, configurations may include additional components, see Sect. 5.2. 


Soundness conditions for big-step semantics 173 


(app-r). As said above, these different choices do not affect the semantic relation 
c= r defined by the inference system, which is always the same. However, they 
will affect the way the extended semantics distinguishing stuck computation and 
non-termination is constructed. Indeed, if the evaluation of e} and e> is stuck 
and non-terminating, respectively, we should obtain stuck computation with rule 
(app) and non-termination with rule (app-r). 

In summary, to see a typical big-step semantics as an instance of our defi- 
nition, it is enough to assume an order (or more than one) on premises, make 
implicit the axiom for results, and add a dummy continuation when needed. In 
the examples (Sect. 5), we will assume a left-to-right order on premises, and 
omit dummy continuations to keep a more familiar style. In the technical part 
(Sect. 3, Sect. 4 and Sect. 6) we will adopt the inline format. 


3 Extended semantics 


In the following, we assume a big-step semantics (C, R, R} and describe two 
constructions which make the distinction between non-termination and stuck 
computation explicit. In both cases, the approach is based on well-know ideas; 
the novel contribution is that, thanks to the meta-theory in Sect. 2, we provide 
a general construction working on an arbitrary big-step semantics. 


3.1 Traces 


We denote by C*, C”, and C% = C*UC"”, respectively, the sets of finite, infinite, 
and possibly infinite traces, that is, sequences of configurations. We write t- t' 
for concatenation of te C* with t/€C™. 

We derive, from the judgement c= r, an enriched big-step judgement c =+ t 
with t € C. Intuitively, t keeps trace of all the configurations visited during the 
evaluation, starting from c itself. To define the trace semantics, we construct, 
starting from R, a new set of rules Rir, which are of two kinds: 


trace introduction These rules enrich the standard semantics by finite traces: 


for each p = rule(ji ... Jn, Jnti, €) in R, and finite traces t),...,tn41€C%, 
we add the rule _ f , , 
C) >ti: RA) --- CGn41) tr tng: RGin+1) 
Cœ C: ti: R(j) PE tn41 : R(jn+1) 


We denote this rule by trace(p, t1,...,tn41), to highlight the relationship 
with the original rule p. We also add one axiom for each result r. 


Tr 

Such rules derive judgements c => t with te C*, for convergent computations. 
divergence propagation These rules propagate divergence, that is, if a 
(sub)configuration in the premise of a rule diverges, then the subsequent 
premises are ignored and the configuration in the conclusion diverges as 
well: for each p = rule(ji ... fn, nti, €) in R, index i€l..n + 1, finite traces 
ti,...,ti-1 € C*, and infinite trace t, we add the rule: 
Ch) Seti Ra)  C(h-1) =v ti-1- RYi-1) Ci) >t 
c> c- ti. R(j) eae? tins R(ti-1) -t 
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e1 Sirti: Aw.e e2>rt2: v2 el[ve/z] Srt-v 


APP-TRACE ti, to, tE CG 
i ) €1 €2 =t €1 €2:t,-Aw.e-to-voq-t-v Tato 
€1 >rt e1 Sirti: Axv.e e2 >t 
(DIV-APP-1) E O”  (piv-app-2) ems = ti Cc", tec” 
e1 €2 >tre1 €2 -t €1 €2 >tr €1 e2 ty: Aw.e-t 


e1 >trti:Aw.e e2 =>trt2 v2 elve/z] Set 


ti, t2 E€ C*,t E C” 
e1 €2 >y €1 e2 t1 : AZ.e- t2- U2- t ie ? 


(DIV-APP-3) 


Fig. 2. Trace semantics for application 


We denote this rule by prop(p, i, t1,...,ti—1, Ł) to highlight the relationship 
with the original rule p. These rules derive judgements c=>+,¢ with t € CY, 
modelling diverging computations. 


The inference system RR; must be interpreted coinductively, to properly 
model diverging computations. Indeed, since there is no axiom introducing an 
infinite trace, they can be derived only by an infinite proof tree. We write 
Retr F c=>trt when the judgment c=+,t is derivable in Rir. 

We show in Fig. 2 the rules obtained starting from meta-rule (arp) of the 
example (for other meta-rules the outcome is analogous). 

For instance, set 2 = ww = (Az.x42z)(Az.a xz), and to the infinite trace 
Q-ww-DQ-w-w-..., it is easy to see that the judgment N =r tq can be derived 
by the following infinite tree:* 


(TRACE-VAL) (TRACE-VAL) (DIV-APP3) 


Wtr w W trw ww = (r2x)[w/r] =r t2 


(DIV-APP3) 


Q>Q-w-w-to=te 
Note that only the judgment 2 = tg can be derived, that is, the trace semantics 
of (2 is uniquely determined to be to, since the infinite proof tree forces the 
equation tg = 2-ww-tyg. This example is a cyclic proof, but there are divergent 
computations with no circular derivation. 
The trace construction is conservative with respect to the original semantics, 
that is, converging computations are not affected. 


Theorem 1. Ry F cyt: r for somet € C* iffREc=S>r. 


3.2 Wrong 


A well-known technique [33] (Exercise 3.5.16) to distinguish between stuck and 
diverging computations, in a sense “dual” to the previous one, is to add a special 
result wrong, so that c= wrong means that the evaluation of c goes stuck. 

In this case, to define an “automatic” version of the construction, starting 
from (C, R, R}, is a non-trivial problem. Our solution is based on defining a re- 
lation on rules, modelling equality up to a certain indez i, also used for other aims 


4 To help the reader, we add equivalent expressions with a grey background. 
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in the following. Consider p = rule(j, ..- jn, jnti, ©), P = rule(ji -~ -Imo Impi C) 
and an index 7 € 1..min(n + 1,m + 1), then p ~; p’ if 


—-c=c 

— for all k < i, jk = jį 

= Cli) = CG) 
Intuitively, this means that rules p and p’ model the same computation until 
the i-th premise. Using this relation, we derive, from the judgment c= r, an 
enriched big-step judgement c= rwr where rwr € RU {wrong}, defined by a set 
of rules Rwr containing all rules in R and two other kinds of rules: 


wrong introduction These rules derive wrong whenever the (sub)configuration 
in a premise of a rule reduces to a result which is not admitted in such (or any 
equivalent) rule: for each p = rule(ji . . . jn, jnti, C) in R, index i € 1..n +1, 
and result r € R, if for all rules p’ such that p ~; p', R(p',i) Æ r, then we 
add the rule wrong(p, i, r) as follows: 
Pee ea C (ji) >r 
c= wrong 


We also add an axiom ———————— for each configuration c which is not the 
c= wrong 


conclusion of any rule. 
wrong propagation These rules propagate wrong analogously to those for di- 
vergence propagation: for each p = rule(ji ... jn, jn41, C) in R, and index 
i € 1..n +1, we add the rule prop(p, i, wrong) as follows: 
Ji---Jiz1 Ci) = wrong 
c= wrong 


We write Rwr F c= rwr when the judgment c= rwr is derivable in Rwr. 

We show in Fig. 3 the meta-rules for wrong introduction and propagation 
constructed starting from those for application and successor. For instance, rule 
(wronc-app) is introduced since in the original semantics there is rule (apr) with 
e €2 in the consequence and e in the first premise, but there is no equivalent 
rule (that is, with e1 e2 in the consequence and e; in the first premise) such that 
the result in the first premise is n. 

The wrong construction is conservative as well. 


Theorem 2. Ry, c>r if RF c>r. 


a>n e => àx.e' 

(WRONG-APP) ————— (WRONG-succ) ———— 

€1 6&2 > wrong Succ e => wrong 

e1 > wrong €1 => Axv.e e2 => wrong 

(PROP-APP-1)°-<—@—@—_—_——— (PROP-APP-2) 

e1 €2 = wrong e1 €2 => wrong 

e => Aw.e e2=> v2 e[v2/x] => wrong e => wrong 
(PROP-APP-3) (PROP-suce 24" 

€1 €2 > wrong Succ e => wrong 


Fig. 3. Semantics with wrong for application and successor 
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4 Expressing and proving soundness 


A predicate (for instance, a typing judgment) is sound when, informally, a pro- 
gram satisfying the predicate (e.g., a well-typed program) cannot go wrong, fol- 
lowing Robin Milner’s slogan [31]. In small-step style, as firstly formulated in [40], 
this is naturally expressed as follows: well-typed programs never reduce to terms 
which neither are values, nor can be further reduced (called stuck terms). The 
standard technique to ensure soundness is by subject reduction (well-typedness 
is preserved by reduction) and progress (a well-typed term is not stuck). 

We discuss how soundness can be expressed for the two approaches previously 
presented and we introduce sufficient conditions. In other words, we provide a 
proof technique to show the soundness of a predicate with respect to a big-step 
semantics. As mentioned in the Introduction, the extended semantics is only 
needed to prove the correctness of the technique, whereas to apply the technique 
for a given big-step semantics it is enough to reason on the original rules. 


4.1 Expressing soundness 


In the following, we assume a big-step semantics (C, R, R), and an indexed 
predicate on configurations, that is, a family H = (J7,),e7, for I set of indexes, 
with MH, C C. A representative case is that, as in the examples of Sect. 5, 
the predicate is a typing judgment and the indexes are types; however, the 
proof technique could be applied to other kinds of predicates. When there is no 
ambiguity, we also denote by J the corresponding predicate L),-; MH, on C (e.g., 
to be well-typed with an arbitrary type). 

To discuss how to express soundness of JT, first of all note that, in the non- 
deterministic case (that is, there is possibly more than one computation for a 
configuration), we can distinguish two flavours of soundness [21]: 


LET 


soundness-must (or simply soundness) no computation can be stuck 
soundness-may at least one computation is not stuck 


Soundness-must is the standard soundness in small-step semantics, and can be 
expressed in the wrong extension as follows: 


soundness-must (wrong) If c € H, then Rw Y c= wrong 


Instead, soundness-must cannot be expressed in the trace extension. Indeed, 
stuck computations are not explicitly modelled. Conversely, soundness-may can 
be expressed in the trace extension as follows: 


soundness-may (traces) If c € I, then there is t such that Rir F c >y t 


whereas cannot be expressed in the wrong semantics, since diverging computa- 
tions are not modelled. 

Of course soundness-must and soundness-may coincide in the deterministic 
case. Finally, note that indexes (e.g., the specific types of configurations) do 
not play any role in the above statements. However, they are relevant in the 
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notion of strong soundness, introduced by [40]. Strong soundness holds if, for 
configurations satisfying I, (e.g., having a given type), computation cannot be 
stuck, and moreover, produces a result satisfying IT, (e.g., of the same type) 
if terminating. Note that soundness alone does not even guarantee to obtain a 
result satisfying IT (e.g., a well-typed result). The three conditions introduced 
in the following section actually ensure strong soundness. 

In Sect. 4.2 we provide sufficient conditions for soundness-must, showing that 
they actually ensure soundness in the wrong semantics (Theorem 3). Then, in 
Sect. 4.3, we provide (weaker) sufficient conditions for soundness-may, and show 
that they actually ensure soundness-may in the trace semantics (Theorem 4). 


4.2 Conditions ensuring soundness-must 


The three conditions which ensure the soundness-must property are local preser- 
vation, 4-progress, and V-progress. The names suggest that the former plays the 
role of the type preservation (subject reduction) property, and the latter two 
of the progress property in small-step semantics. However, as we will see, the 
correspondence is only rough, since the reasoning here is different. 

Considering the first condition more closely, we use the name preservation 
rather than type preservation since, as already mentioned, the proof technique 
can be applied to arbitrary predicates. More importantly, local means that the 
condition is on single rules rather than on the semantic relation as a whole, as 
standard subject reduction. The same holds for the other two conditions. 


Definition 1 (S1: Local Preservation). For each p=rule(j1 ..- jn, jn41, €); if 
cEll,, then there exist t1,...,tn41 € I, with tn41=1, such that, for all k € len +1: 


if, for allh < k, R(jn) € M.,, then C (jpk) € Hap- 


Thinking to the paradigmatic case where the indexes are types, for each rule 
p, if the configuration c in the consequence has type 4, we have to find types 
l1,--+;én+1 Which can be assigned to (the configurations in) the premises, in 
particular the same type as c for the continuation. More precisely, we start find- 
ing type t1, and successively find the type tẹ for (the configuration in) the k-th 
premise assuming that the results of all the previous premises have the expected 
types. Indeed, if all such previous premises are derivable, then the expected type 
should be preserved by their results; if some premise is not derivable, the consid- 
ered rule is “useless”. For instance, considering (an instantiation of) meta-rule 
(app) rule(e; => Az.e e2 => v, e[v2/z] = v, e1 e2) in Sect. 2, we prove that e[v2/x] 
has the type T of eı e2 under the assumption that Ax.e has type T’ > T, and 
v has type T” (see the proof example in Sect. 5.1 for more details). 
A counter-example to condition S1 is discussed at the beginning of Sect. 5.3. 

The following lemma states that local preservation actually implies preser- 
vation of the semantic relation as a whole. 


Lemma 1 (Preservation). Let R and IT satisfy condition S1. IfRt csr 
and c € I,, then r € II,. 
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Proof. The proof is by a double induction. We denote by RH and JH the first 
and the second induction hypothesis, respectively. The first induction is on big- 
step rules. Axioms have conclusion r= r, hence the thesis holds since r € IT, by 
hypothesis. Other rules have shape rule(j1 ..- jn, jn41, c) with c € I,. We prove 
by complete induction on k € 1..n + 1 thatC(j,) € M,,, for all k € 1..n +1 and 
for some t1,...,tn41 € I. By S1, there are t1,...,tn41 € I and C(j) € Mn. 
For k > 1, by IH we know that C(jn) € I,,, for all h < k. Then, by RH, we 
get that R(j,) € I. Moreover by S1, C(y,) € IH, as needed. In particular, 
we have just proved that C(jn4i) € M,„,ı and, since by S1 tn41 = 2, we get 
C(jn41) € I. Then, by RH, we conclude that r = R(jn+1) € M,, as needed. 


The following proposition is a form of local preservation where indexes (e.g., 
specific types) are not relevant, simpler to use in the proofs of Theorems 3 and 4. 


Proposition 1. Let R and I satisfy condition S1. For each rule(jı .. . jn, Jnti, C) 
and k € 1..n +1, ifc € H and, for allh < k, RF jn, then C (jk) € I. 


The second condition, named 4-progress, ensures that, for configurations sat- 
isfying the predicate IT (e.g., well-typed), we can start constructing a proof tree. 


Definition 2 (S2: 4-progress). For each c € IT\R, C(p) = c for some rule p. 


The third condition, named V-progress, ensures that, for configurations sat- 
isfying IT, we can continue constructing the proof tree. This condition uses the 
notion of rules equivalent up-to an index introduced at the beginning of Sect. 3.2. 


Definition 3 (S3: V-progress). For each p = rule(j, ... Jn, Inti, ©), if c € I, 
then, for each k € 1..n +1: 


if, for allh < k, RF jn and RF C(jk)= r, for some r € R, then there 
is a rule p' ~p p such that R(p',k) =r. 


We have to check, for each rule p, the following: if the configuration c in the 
consequence satisfies the predicate (e.g., is well-typed), then, for each k, if the 
configuration in premise k evaluates to some result r (that is, R C(j,) > r), 
then there is a rule (p itself or another rule with the same configuration in the 
consequence and the first k — 1 premises) with such judgment as k-th premise. 
This check can be done under the assumption that all the previous premises 
are derivable. For instance, consider again (an instantiation of) the meta-rule 
(arp) rule(ey = Ax.€ €2 => v, eļv2/x]= v, e1 e2). Assuming that e, evaluates to 
some vı, we have to check that there is a rule with first premise e} > v, in 
pratice, that vı is a A-abstraction; in general, checking S3 for a (meta-)rule 
amounts to show that (sub)configurations in the premises evaluate to results 
with the required shape (see also the proof example in Sect. 5.1). 


Soundness-must in wrong semantics Recall that Ry, is the extension of R with 
wrong (Sect. 3.2). We prove the claim of soundness-must with respect to Rwr- 
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Theorem 3. Let R and IT satisfy conditions S1, S2 and S3. If c € IT, then 
Rwr 7 c> wrong. 


Proof. To prove the statement, we assume Rwr F c= wrong and look for a con- 
tradiction. The proof is by induction on the derivation of c = wrong. 

If the last applied rule is an axiom, then, by construction, there is no rule p E R 
such that C (p) = c, and this violates condition S2, since c € IT. 

If the last applied rule is wrong(p, i, r), with p = rule(j1... fn, Jn4i, €), then, 
by hypothesis, for all k < i, Rwr F jk, and Rwr F C(ji)= r, and these judg- 
ments can also be derived in R by conservativity (Theorem 2). Furthermore, by 
construction of this rule, we know that there is no other rule p’ ~; p such that 
R(p',i) = r, and this violates condition S3, since c € HM. 

If the last applied rule is prop(p, i, wrong), with p = rule(j, ... Jn, Jn¢i, €), then, 
by hypothesis, for all k < i, Rw: F jk, and these judgments can also be derived 
in R by conservativity. Then, by Prop. 1 (which requires condition $1), since 
c € IT, we have C(j;) € I, hence we get the thesis by induction hypothesis. 


Sect. 5.1 ends with examples not satisfying properties S2 and S3. 


4.3 Conditions ensuring soundness-may 


As discussed in Sect. 4.1, in the trace semantics we can only express a weaker 
form of soundness: at least one computation is not stuck (soundness-may). As 
the reader can expect, to ensure this property weaker sufficient conditions are 
enough: namely, condition S1, and another condition named progress-may and 
defined below. 

We write R Y c= if c does not converge (there is no r such that RE c> r). 


Definition 4 (S4: progress-may). For each c € II\R, there is 
p = rule(ji ... jn, jn41, €) such that: 


if there is a (first) k € 1.n+1 such that R Y jk and, for all h < k, 
RE jn, then RY Cik) =>. 


This condition can be informally understood as follows: we have to show that 
there is an either finite or infinite computation for c. If we find a rule where all 
premises are derivable (no k), then there is a finite computation. Otherwise, c 
does not converge. In this case, we should find a rule where the configuration in 
the first non-derivable premise k does not converge as well. Indeed, by coinduc- 
tive reasoning (use of Lemma 2 below), we obtain that c diverges. The following 
proposition states that this condition is indeed a weakening of S2 and S3. 


Proposition 2. Conditions S2 and S3 imply condition S4. 


Soundness-may in trace semantics Recall that Rir is the extension of R with 
traces, defined in Sect. 3.1, where judgements have shape c= t, with t € C™. 
The following lemma provides a proof principle useful to coinductively show 
that a property ensures the existence of an infinite trace, in particular to show 
Theorem 4. It is a slight variation of an analogous principle presented in [8]. 
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Lemma 2. LetS C C bea set. If, for all c € S, there are p = rule(j, . . . fn, jn41; ©) 
and k E€ 1.n +1 such that 


1. forallh< k, RF jn, and 


then, for all c € S, there ist E€ C® such that Rir F C> t. 


Theorem 4. Let R and IT satisfy conditions S1 and S4. If c € I, then there 
is t such that Rir c= et. 


Proof. First note that, thanks to Theorem 1, the statement is equivalent to the 
following: 

If ce H and RY c=, then there is t € CY such that Rir F c> +t. 
Then, the proof follows from Lemma 2. We define S = {c | ceM and RY c>}, 
and show that, for all c € S, there are p = rule(ji ..- jn, Indi, C) and k € 1..n+1 
such that, for all h < k, RF jn, and C(jk) E€ S. 

Consider c € S, then, by S4, there is p = rule(jı . . . Jn, jn41, €). By definition 
of S, we have R Y c> , hence there exists a (first) k € 1..n+1 such that R jr, 
since, otherwise, we would have R F c= R(jn+1). Then, since k is the first index 
with such property, for all h < k, we have R F ja, hence, again by condition 
S4, we have that R I’ C (jk) = . Finally, since for all h < k we have R F jn, by 
Prop. 1, we get C (jp) € I, hence C (jk) € S, as needed. 


5 Examples 


Sect. 5.1 explains in detail how a typical soundness proof can be rephrased in 
terms of our technique, by reasoning directly on big-step rules. Sect. 5.2 shows 
a case where this is advantageous, since the property to be checked is not pre- 
served by intermediate computation steps, whereas it holds for the final result. 
Sect. 5.3 considers a more sophisticated type system, with intersection and union 
types. Finally, Sect. 5.4 shows another example where subject reduction is not 
preserved, whereas soundness can be proved with our technique. This example 
is intended as a preliminary step towards a more challenging case. 


5.1 Simply-typed A-calculus with recursive types 


As a first example, we take the A-calculus with natural constants, successor, and 
choice used in Sect. 2 (Fig. 1). We consider a standard simply-typed version with 
recursive types, obtained by interpreting the production in Fig. 4 coinductively. 
Introducing recursive types makes the calculus non-normalising and permits to 
write interesting programs such as (2 (see Sect. 3.1). 

The typing rules are recalled in Fig. 4. Type environments, written I”, are 
finite maps from variables to types, and ’'{ T/x} denotes the map which returns 
T on x and coincides with I elsewhere. We write H e: T for øH e: T. 

Let Rı be the big-step semantics defined in Fig. 1, and let W1r(e) hold if 
H e: T, for T defined in Fig. 4. To prove the three conditions $1, S2 and S3 of 
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T := Nat | Tı > Tə type 


T(z) =T (T-CONST) 


(T-VAR) 


Pre: En: Nat 
= IT{T'/z}Fe:T Tra:T’3T Pre: T 
Pt Trae Oe Pera. T 
a IF e: Nat seth Tra:T Fhe: T 
eee I F succe : Nat Cane Trre@e:T 


Fig. 4. A-calculus: type system 


Sect. 4.2, we need lemmas of inversion, substitution and canonical forms, as in 
the standard technique. 


Lemma 3 (Inversion). 


iff a: T, then I(2) = T. 

fll n: T, then T = Nat. 

If z.e: T, then T = Ti > Tz and I{T,/r} bt e: To. 
Hr- e eg: 7, then rF e: T — T, andl Fe: T’. 
IfI succe: T, then T = Nat and IF e: Nat. 

Ffr- eae: T, then l Fe: T withi€ 1,2. 


aus wwep 


Lemma 4 (Substitution). If T{T'/x} F e: T and! e’: T', then T F eļe'/z]: T. 
Lemma 5 (Canonical Forms). 


1. If v: T! > T, then v = \z.e. 
2. If- v: Nat, then v= n. 


Theorem 5 (Soundness). The big-step semantics Rı and the indexed predi- 
cate IT1 satisfy the conditions S1, S2 and S3 of Sect. 4.2. 


Since the aim of this first example is to illustrate the proof technique, we 
provide a proof where we explain the reasoning in detail. 


Proof of S1. We should prove this condition for each (instantiation of meta-)rule. 
(app): Assume that F e1 e2 : T holds. We have to find types for the premises, 
notably T for the last one. We proceed as follows: 


1. First premise: by Lemma 3 (4), F e : T’ > T. 

2. Second premise: again by Lemma 3 (4), F e> : T’ (without needing the 
assumption F àx.e : T’ > T). 

3. Third premise: F e[v2/z] : T should hold (assuming | Az.e : T’ > T, 
F v : T’). Since F Are: T' > T, by Lemma 3 (3) we have z: T’ F e: T, so 
by Lemma 4 and F wv: T’ we have F e[v/z] : T. 
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(succ): This rule has an implicit continuation n + 1=n +1. Assume that 
H succe : T holds. By Lemma 3 (5), T = Nat, and + e : Nat, hence we find 
Nat as type for the first premise. Moreover, F n+ 1: Nat holds by rule (t-const). 
(cuorce): Assume that F e1 © e2 : T holds. By Lemma 3 (6), we have F e; : T, 
with 7 € 1,2. Hence we find T as type for the premise. 


Proof of S2. We should prove that, for each non-result configuration (here, 
expression e which is not a value) such that + e : T holds for some T, there is 
a rule with this configuration in the consequence. The expression e cannot be a 
variable, since a variable cannot be typed in the empty environment. Applica- 
tion, successor and choice appear as consequence in the reduction rules. 


Proof of S3. We should prove this condition for each (instantiation of meta-)rule. 
(arp): Assuming F e; e2 : T, again by Lemma 3 (4) we get lF e, : T’ > T. 


1. First premise: if el > v is derivable, then there should be a rule with e1 e2 
in the consequence and el= v as first premise. Since we proved S1, by 
preservation (Lemma 1) + v: T’ > T holds. Then, by Lemma 5 (1), v has 
shape Ax.e, hence the required rule exists. As noted at page 10, in practice 
checking S3 for a (meta-)rule amounts to show that (sub)configurations in 
the premises evaluate to results which have the required shape (to be a 
A-abstraction in this case). 

2. Second premise: if ey > Axv.e, and e2=> vo, then there should be a rule with 
€1 €2 in the consequence and e; => Ax.e, e2 => v as first two premises. This is 
trivial since the meta-variable vz can be freely instantiated in the meta-rule. 


(succ): Assuming F succ e : T, again by Lemma 3 (5) we get F e : Nat. If e= v 
is derivable, there should be a rule with succ e in the consequence and e = v as 
first premise. Indeed, by preservation (Lemma 1) and Lemma 5 (2), v has shape 
n. For the second premise, if n + 1 => v is derivable, then v is necessarily n + 1. 
(cnorce): Trivial since the meta-variable v can be freely instantiated. 


An interesting remark is that, differently from the standard approach, there 
is no induction in the proof: everything is by cases. This is a consequence of the 
fact that, as discussed in Sect. 4.2, the three conditions are local, that is, they 
are conditions on single rules. Induction is “hidden” in the proof that those three 
conditions are sufficient to ensure soundness. 

If we drop in Fig. 1 rule (succ), then condition S2 fails, since there is no longer 
a rule for the well-typed non-result configuration succ n. If we add the (roor) rule 
H 00: Nat, then condition S3 fails for rule (arr), since 0 > 0 is derivable, but 
there is no rule with 00 in the conclusion and 0 => 0 as first premise. 


5.2 MiniFJ&A 


In this example, the language is a subset of FJ&A [12], a calculus extending 
Featherweight Java (FJ) with \-abstractions and intersection types, introduced 
in Java 8. To keep the example small, we do not consider intersections and focus 
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on one key typing feature: \-abstractions can only be typed when occurring in a 
context requiring a given type (called the target type). In a small-step semantics, 
this poses a problem: reduction can move A-abstractions into arbitrary contexts, 
leading to intermediate terms which would be ill-typed. To maintain subject 
reduction, in [12] \-abstractions are decorated with their initial target type. In 
a big-step semantics, there is no need of intermediate terms and annotations. 

The syntax is given in the first part of Fig. 5. We assume sets of variables 
x, class names C, interface names |, J, field names f, and method names m. 
Interfaces which have exactly one method (dubbed functional interfaces) can be 
used as target types. Expressions are those of FJ, plus A-abstractions, and types 
are class and interface names. In Agzs.e we assume that zs is not empty and e 
is not a A-abstraction. For simplicity, we only consider upcasts, which have no 
runtime effect, but are important to allow the programmer to use A-abstractions, 
as exemplified in discussing typing rules. 

To be concise, the class table is abstractly modelled as follows: 


fields(C) gives the sequence of field declarations Ti f1;.. Tn fn; for class C 

mtype(T,m) gives, for each method m in class or interface T, the pair 

Ti... Ta > T’ consisting of the parameter types and return type 

— mbody(C, m) gives, for each method m in class C, the pair (zı... £n, e) con- 
sisting of the parameters and body 

— <: is the reflexive and transitive closure of the union of the extends and 
implements relations 

— !mtype(l) gives, for each functional interface |, mtype(l, m), where m is the 

only method of I. 


The big-step semantics is given in the last part of Fig. 5. MINIFJ&A shows 
an example of instantiation of the framework where configurations include an 
auxiliary structure, rather than being just language terms. In this case, the 
structure is an environment E (a finite map from variables to values) modelling 
the current stack frame. Results are values, which are either objects, of shape 
[vs]©, or \-abstractions. 

Rules for FJ constructs are straightforward. Note that, since we only consider 
upcasts, casts have no runtime effect. Indeed, they are guaranteed to succeed on 
well-typed expressions. Rule (A-mvx) shows that, when the receiver of a method 
is a A-abstraction, the method name is not significant at runtime, and the effect 
is that the body of the function is evaluated as in the usual application. 

The type system is given in Fig. 6. Method bodies are expected to be well- 
typed with respect to method types. Formally, mbody(C,m) and mtype(C, m) 
are either both defined or both undefined: in the first case mbody(C,m) = 
(T1... n, €), mtype(C,m) = Ti... Ta > T, and z1: Ti,..., In: Tn, this:CF e: 
T. Moreover, we assume other standard FJ constraints on the class table, such 
as no field hiding, no method overloading, the same parameter and return types 
in overriding. 

Besides the standard typing features of FJ, the MINIFJ&A type system en- 
sures the following. 
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e n= r | e.f |new C(e1,...,en) | e.m(e1,...,en) | Aws.e | (Te expression 
TS ::= 11... Ln variable list 
T g=] type 
c x= (E, e) |v configuration 
v n= [us]© | Azs.e result (value) 
US. i Ujas Ui value list 
i Elz) = 
“agp e 


(E, e) = [u1,...,Un]© fields(C) = Tı fi;... Ta fn; 
(E, e.fi) > vi iEl.n 


(FIELD-ACCESS) 


(E, &) > v; Vie lin 


(NEW) 


(E, new C(e1,..., @n)) > [v1,.--, Un]E 
(E, eo) = [vs] 
(E, e:) => v; Vie l.n 
i (tw, Sess nin, this:[us]©, e) = v mbody(C, m) = (tı E e) 
INVK 
: (E, €0-m(e1,..., €n)) => vV 


(E, €0) > Azs.e 
(E, e) => uv Vie lin 
(@1:U1,...,;Initn, €) => (E, e) > v 
(URGAST) DE E T 
, @o.m(€1,..., €n v ; e v 
(E ¢ ))=> (E, (T)e) => 


(A-INVK) 


Fig. 5. MiniIF J&A: syntax and big-step semantics 


— A functional interface | can be assigned as type to a A-abstraction which has 


the functional type of the method, see rule (T-A). 


— A -abstraction should have a target type determined by the context where 
the \-abstraction occurs. More precisely, see [25] page 602, a A-abstraction 
in our calculus can only occur as return expression of a method or argument 
of constructor, method call or cast. Then, in some contexts a A-abstraction 
cannot be typed, in our calculus when occurring as receiver in field access or 
method invocation, hence these cases should be prevented. This is implicit 
in rule (t-rreLp-access), since the type of the receiver should be a class name, 
whereas it is explicitly forbidden in rule (t-mvx). For the same reason, a à- 


abstraction cannot be the main expression to be evaluated. 


— A d-abstraction with a given target type J should have type exactly J: a 
subtype | of J is not enough. Consider, for instance, the following program: 


interface J {} 
interface I extends J { A m(A x); } 
class C { 
C m(I y) { return new C().n(y); } 
C n(J y) { return new C(); } 
F 


Soundness conditions for big-step semantics 185 


Fu: Ti Vicin a:Ti,...,m:T, be: T 
F (a1:01,.++,IniUn, e) T 


T; <: Ti Vie lin 


(T-CONF) 


Ir}e:C  fields(C) = Tifi; ... Tn fn; 
I TFef:T; t€1.n 


(T-VAR) r(x) SL (T-FIELD-ACCESS 


TFHZ:T 


Tre: Ti Vi € l.n 
A new C(e1,..., €n) :C 


fields(C) = T} fi; eee fas 


(T-NEW) 


eo not of shape Azs.e 


Tee: TT; Vie 0..n mtype(7o,m) = Ti... Tn > T 


Fr e9.m(e1,...,€n): 7 


(T-INVK) 


Cilio e dn res LT 


2 ! = 
(T-A) Teone] Imtype(l) = Ti... Ta > T 
= Frei? Cru: Tj) Viel.n  fields(C) = Ti fi; ... Tr fas 
(T-UPCAST) TeE(Me:-T E(TMe:T (T-OBJECT) TE f, a un ]E Ta T! <: Ti Yiceln 
ľIFĀe:T enot of shape Azs.e 
(T-SUB 


I TFe: T T<: T 


Fig. 6. MINIFJ&A: type system 


and the main expression new C() .n(Ax.x). Here, the A-abstraction has tar- 
get type J, which is not a functional interface, hence the expression is ill- 
typed in Java (the compiler has no functional type against which to type- 
check the \-abstraction). On the other hand, in the body of method m, the 
parameter y of type | can be passed, as usual, to method n expecting a su- 
pertype. For instance, the main expression new C() .m(Az.z) is well-typed, 
since the A-abstraction has target type |, and can be safely passed to method 
n, since it is not used as function there. To formalise this behaviour, it is 
forbidden to apply subsumption to A-abstractions, see rule (rsvp). 

— However, A-abstractions occurring as results rather than in source code (that 
is, in the environment and as fields of objects) are allowed to have a sub- 
type of the required type, see the explicit side condition in rules (r-conr) 
and (r-onsect). For instance, if C is a class with one field Jf, the expression 
new C((I)Az.x) is well-typed, whereas new C(Ax.x) is ill typed, since rule 
(v-sus) cannot be applied to A-abstractions. When the expression is evaluated, 
the result is [\w.a]©, which is well-typed. 


As mentioned at the beginning, the obvious small-step semantics would produce 
not typable expressions. In the above example, we get 
new C((l)Az.2) —> new C(Az.2) —> [Az.a]© 
and new C(\a.x) has no type, while new C((I)Aa.x) and [\a.2]© have type C. 
We write [+ e:<: T as short for [+ e: T’ and T’ <: T for some TJ’. In 
order to state soundness, set Ra the big-step semantics defined in Fig. 5, and let 
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IT27((E, e)) hold if F (E, e) :<: T, H27(v) if- v :<: T, for T defined in Fig. 5. 


Theorem 6 (Soundness). The big-step semantics Ra and the indexed predi- 
cate I12 satisfy the conditions S1, S2 and S3 of Sect. 4.2. 


5.3 Intersection and union types 


We enrich the type system of Fig. 4 by adding intersection and union type 
constructors and the corresponding typing rules, see Fig. 7. As usual we require 
an infinite number of arrows in each infinite path for the trees representing types. 
Intersection types for the \-calculus have been widely studied [11]. Union types 
naturally model conditionals [26] and non-deterministic choice [22]. 


T := Nat | Ti > To | Ti A To | Ti V T2 type 


Tret Tibers a» rei TAs Fee:TAs 
i Tre:TAS We Tees t AE- TEES 
i Tre: T : Tre:S 
WO Thecrvs “V? TFe:TVv8 


Fig. 7. Intersection and union types: syntax and typing rules 


The typing rules for the introduction and the elimination of intersection 
and union are standard, except for the absence of the union elimination rule: 
so I{T/t}Fe:V r{S/t}F-e:V Fre: TVS 

Tk ele’/a]:V 
As a matter of fact rule (vz) is unsound for &. For example, let split the type 
Nat into Even and Odd and add the expected typings for natural numbers. The 
prefix addition + has type 
(Even — Even — Even) A (Odd — Odd — Even) 


and we derive 


F 1: Odd F 2: Even 
= Vv D = (y 
F 1: Even V Odd F 2: Even V Odd 
(8) 
x:Even F +x zx:Even zx:0dd F + g gx:Even H (1 @ 2) : Even V Odd 
(V E) 
F+(1 9 2)(1@ 2) : Even 


We cannot assign the type Even to 3, which is a possible result, so strong sound- 
ness is lost. In the small-step approach, we cannot assign Even to the interme- 
diate term + 12, so subject reduction fails. In the big-step approach, there is no 
such intermediate term; however, condition S1 fails for the reduction rule for +. 
Indeed, considering the following instantiation of the rule: 
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16251 16252 353 
+(162)(162) 33 
and the type Even for the consequence, we cannot assign this type to the (con- 
figuration in) last premise (continuation). 
Intersection types allow to derive meaningful types also for expressions con- 
taining variables applied to themselves, for example we can derive 
FAgacrz:(TosS)ATOS 
With union types all non-deterministic choices between typable expressions can 
be typed too, since we can derive I’ F e, @ eg: Ti V To from I F e : Ty and 
Te ez : Tə. 
In order to state soundness, let H3r(e) be F e: T, for T defined in Fig. 7. 


Theorem 7 (Soundness). The big-step semantics Rı and the indexed predi- 
cate IT3 satisfy the conditions S1, S2 and S3 of Sect. 4.2. 


5.4 MiniFJ&O 


A well-known example in which proving soundness with respect to small-step 
semantics is extremely challenging is the standard type system with intersection 
and union types [10] w.r.t. the pure A-calculus with full reduction. Indeed, the 
standard subject reduction technique fails”, since, for instance, we can derive 
the type (T > T>3V)A(S > S> V)—> (U > TVS) 3 U > V for both 
Ax. Ay.Az.0((At.t)(y z))((At.t)(y z)) and Aw.Ay.ArAz.x(y z)(y z), but the intermedi- 
ate expressions Ax.Ay.Az.x((At.t)(y z))(yz) and Az.ày.Az.x(y z)((At.t)(y z)) do 
not have this type. 

As the example shows, the key problem is that rule (vz) can be applied to 
expression e where the same subexpression e’ occurs more than once. In the 
non-deterministic case, as shown by the example in the previous section, this 
is unsound, since e’ can reduce to different values. In the deterministic case, 
instead, this is sound, but cannot be proved by subject reduction. Since using 
big-step semantics there are no intermediate steps to be typed, our approach 
seems very promising to investigate an alternative proof of soundness. Whereas 
we leave this challenging problem to future work, here as first step we describe a 
(hypothetical) calculus with a much simpler version of the problematic feature. 

The calculus is a variant of FJ [27] with intersection and union types. Meth- 
ods have intersection types with the same return type and different parameter 
types, modelling a form of overloading. Union types enhance typability of condi- 
tionals. The more interesting feature is the possibility of replacing an arbitrary 
number of parameters with the same expression having an union type. We dub 
this calculus MINIFJ&O. 

Fig. 8 gives the syntax, big-step semantics and typing rules of MInIFJ&O. 
We omit the standard big-step rule for conditional, and typing rules for boolean 


5 For this reason, in [10] soundness is proved by an ad-hoc technique, that is, by 
considering parallel reduction and an equivalent type system a la Gentzen, which 
enjoys the cut elimination property. 
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e u=axi|vie.f|e.m(e1,...,en) | if ethen e else e2 expression 

v x= new C(v1,...,Un) | true | false value 

T s:=C|Bool|\V,2,., Ti expression type 
MT == Ayezem(C®...c® > D) method type 


e => new C(u1,...,tn)  fields(C) = Ti fi; ... Tn fn; 


(FIELD-ACCESS) 


e.fi > vi iE l.n 
ei> v Vie l..n 
PEW new C(e1,..., €n) => new C(w,..., Un) 
eo => new C(vs') 
e> v Vie l..n 
efv /z1].. [un/£nl new C(vs’) /this] > v 
(ivr) [en eha id al / l mbody(C, m) = (z1 ... Zn, e) 
eo.M(€1,..., €En) >v 
(T-VAR) TE. T r(x) =T (T-FIELD-ACCESS) ae = Ti tress Tats 
Tre: C; Vi € 1..n 4 
-NEW fields(C) = Ti fi; ... Tn fn; 
[UTC en): C POS Tfi 
DH ei:C Vie On TH e:Vicicn Ds Mtype(Co,m) <: 
(T-INVK) Visism Aize (Ci T Ca D; Sale D; =} C) 
FE eg.m(e1,...,€n,€,---,€):C sm < 
ey p 
Pp 
rH e:Bol Fea: T Che: T Pres T j 
- -suB) ————_ T<: T 
ee) IF if e then e else e2: T ek Tre: T’ = 


Fig. 8. MINIFJ&O: syntax, big-step semantics and type system 


constants. The subtyping relation <: is the reflexive and transitive closure of the 
union of the extends relation and the standard rules for union: 
T, <: Ty V To Ti <: To V Ti 
On the other hand, method types (results of the mtype function) are now inter- 
section types, and the subtyping relation on them is the reflexive and transitive 
closure of the standard rules for intersection: 
MTı A MTə <: MT, MTı A MTə <: MTə2 

The functions fields and mbody are defined as for MINIFJ&A. 

Instead mtype(C, m) gives, for each method m in class C, an intersection type. We 
assume mbody(C, m) and mtype(C, m) either both defined or both undefined: in 
the first case mbody(C, m)=(21... £n, e), mtype(C, m)= Nien ke? ...C > D), 
and ac, hag mic, this:C Fe:D fori €1..m. 

Clearly rule (rt-vxg) is inspired by rule (ve), but the restriction to method 
calls endows a standard inversion lemma. The subtyping in this rule allows to 
choose the types for the method best fitting the types of the arguments. Not 
surprisingly, subject reduction fails for the expected small-step semantics. For 
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example, let class C have a field point which contains cartesian coordinates and 
class D have a field point which contains polar coordinates. The method eq takes 
two objects and compares their point fields returning a boolean value. A type for 
this method is (CC > Bool) A (DD —> Bool) and we can type eq(e, e), where 
e = if false then new C(...) else new D(...) 

In fact e has type C V D. Notice that in a standard small-step semantics 

eq(e, e) — eq(new D(...), if false then new C(...) else new D(...)) 
and this last expression cannot be typed. 


In order to state soundness, let R4 be the big-step semantics defined in Fig. 8, 
and let H4r(e) hold if F e: T, for T defined in Fig. 8. 


Theorem 8 (Soundness). The big-step semantics R4 and the indexed predi- 
cate IT4 satisfy the conditions S1, S2 and S3 of Sect. 4.2. 


6 The partial evaluation construction 


In this section, our aim is to provide a formal justification that the constructions 
in Sect. 3 are correct. For instance, for the wrong semantics we would like to be 
sure that all the cases are covered. To this end, we define a third construction, 
dubbed PEv for “partial evaluation”, which makes explicit the computations of 
a big-step semantics, intended as the sequences of execution steps of the natu- 
rally associated evaluation algorithm. Formally, we obtain a reduction relation 
on approximated proof trees, so non-termination and stuck computation are 
distinguished, and both soundness-must and soundness-may can be expressed. 

To this end, first of all we introduce a special result ?, so that a judgment 
c=? (called incomplete, whereas a judgment in R is complete) means that the 
evaluation of c is not completed yet. Analogously to the previous constructions, 
we define an augmented set of rules R? for the judgment extended with ?: 


? introduction rules These rules derive ? whenever a rule is partially applied: 
for each rule p = rule(j, ..- jn, Inti, C) in R, index i € 1..n + 1, and result 
r € R, we define the rule intro?(p, i, r) as 
jt ae Ji-1 Cj) > 1 
c=? 
for each configuration c € C. 


We also add an axiom 


C>! 
? propagation rules These rules propagate ? analogously to those for diver- 


gence and wrong propagation: for each p = rule(ji ... jn, jn41, €) in R, and 
index i € 1..n + 1, we add the rule prop(p,i,?) as follows: 
c=? 


Finally, we consider the set 7 of the (finite) proof trees r in Rz. Each 7 can 
be thought as a partial proof or partial evaluation of the root configuration. In 
particular, we say it is complete if it is a proof tree in R (that is, it only contains 


complete judgments), incomplete otherwise. We define a reduction relation eae 
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; R ds? C(p)=c 
(r7) {r} (c?) => (Prote y 
"r=? r>r "es? eee e>? Clip l=e 
/ 
TI Ti R TA Ti a 
wee Th was Te : 
(intro? (p, i, r)) > (p') R(p', 7) =f 
c=>? c>r 1 
#p =i 
T 
Ta rss Te R tacne? P i P 
(intro (p, i, r)) ——— = — > poleit) sass j R(p’,i) =r 
c=>? c=>? pa j 
C(p'",it+l)=e 


1 R 1 
ToT R T1 0... Ti-1 Ti Tti — r 
(prop(p,4,?)) T —— (ropte) 3 — p * 
Aai c>? — R(r(7))=? 
Ti R Ti Ti-1 T! T R 7! 
Lee Ti ee Tie : / 
(prop(p.4,?)) — —_> (intron (p, i D) ———— aan $ 
i c>? Ri(r(7{)) =r 


Fig. 9. Reduction relation on T 


on 7 such that, starting from the initial proof tree >» we derive a sequence 


where, intuitively, at each step we detail the proof (evaluation). In this way, a 
sequence ending with a complete tree ——— models terminating computation, 


whereas an infinite sequence (tending to an infinite proof tree) models divergence, 
and a stuck sequence models a stuck computation. 


The one-step reduction relation s on Tis inductively defined by the rules 
in Fig. 9. In this figure #p denotes the number of premises of p, and r(r) the 
root of r. We set R?(c => u) = u where u € RU{?}. Finally, ~; is the equivalence 
up-to an index of rules, introduced at the beginning of Sect. 3.2. As said above, 
each reduction step makes “less incomplete” the proof tree. Notably, reduction 
rules apply to nodes with consequence c= ?, whereas subtrees with root c= r 
represent terminated evaluation. In detail: 


— If the last applied rule is an axiom, and the configuration is a result r, then 
we can evaluate r to itself. Otherwise, we have to find a rule p with c in the 
consequence and start evaluating the first premise of such rule. 

— If the last applied rule is intro7(p, i, r), then all subtrees are complete, hence, 
to continue the evaluation, we have to find another rule p’, having, for each 
k € 1..2, as k-th premise the root of 7. Then there are two possibilities: if 
there is an 7+ 1-th premise, we start evaluating it, otherwise, we propagate 
to the conclusion the result r of 7;. 

— If the last applied rule is a propagation rule prop(p,i,?), then we simply 
propagate the step made by 7;. 


In Fig. 10 we report an example of PEV reduction. 

We end by stating the three constructions to be equivalent to each other, 
thus providing a coherency result of the approach. In particular, first we show 
that PEV is conservative with respect to R, and this ensures the three construc- 
tions are equivalent for finite computations. Then, we prove traces and wrong 
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R, AD. >? R, AT. > ÀL. R A Arts rA\n.c n=? 
(Az.xz) n=? (Az.z) n=>? (Au.z) n=? (Au.z) n=? 
R ACGTS>drADULG nSn R ALCS ALAG nən n>? 
(Az.z) n=>? (Az.z) n=>? 
R ALTSAA n>n NS R ADLPrATG n>n n>N 
(Az.z) n=>? (Azs) n>n 


Fig. 10. The evaluation in PEV of (Ax.x) n. 


constructions to be equivalent to PEV for diverging and stuck computations, 
respectively, and this ensures they cover all possible cases. 


Theorem 9. 1. RF c>r iff Se where r(T) =c>r. 


c=>? 
2. Rir F C= t for some t € CY iff Ty 


c=>? 


: R 
3. Rur F c> wrong iff ——*7, where T is stuck. 


c>? 


7 Related work 


Modeling divergence The issue of modelling divergence in big-step semantics 
dates back to [18], where a stratified approach with a separate coinductive judg- 
ment for divergence is proposed, also investigated in [30]. 

In [5] the authors models divergence by interpreting coinductively standard 
big-step rules and considering also non-well-founded values. In [17] a similar tech- 
nique is exploited, by adding a special result modelling divergence. Flag-based 
big-step semantics [36] captures divergence by interpreting the same semantic 
rules both inductively and coinductively. In all these approaches, spurious judge- 
ments can be derived for diverging computations. 

Other proposals [32,3] are inspired by the notion of definitional interpreter 
[37], where a counter limits the number of steps of a computation. Thus, diver- 
gence can be modelled on top of an inductive judgement: a program diverges if 
the timeout is raised for any value of the counter, hence it is not directly mod- 
elled in the definition. Instead, [20] provides a way to directly model divergence 
using definitional interpreters, relying on the coinductive partiality monad [16]. 

The trace semantics in Sect. 3.1 has been inspired by [29]. Divergence propa- 
gation rules are very similar to those used in [8,9] to define a big-step judgment 
which directly includes divergence as result. However, this direct definition relies 
on a non-standard notion of inference system, allowing corules [7,19], whereas 
for the trace semantics presented in this work standard coinduction is enough, 
since all rules are productive, that is, they always add an element to the trace. 

Differently from all the previously cited papers which consider specific exam- 
ples, the work [2] shares with us the aim of providing a generic construction to 
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model non-termination, basing on an arbitrary big-step semantics. Ager consid- 
ers a class of big-step semantics identified by a specific shape of rules, and defines, 
in asmall-step style, a proof-search algorithm which follows the big-step rules; in 
this way, converging, diverging and stuck computations are distinguished. This 
approach is somehow similar to our PEV semantics, even tough the transition 
system we propose is directly defined on proof trees. 

There is an extensive body of work on coalgebraic techniques, where the 
difference between semantics can be simply expressed by a change of functor. 
In this paper we take a set-theoretic approach, simple and accessible to a large 
audience. Furthermore, as far as we know [38], coalgebras abstract several kinds 
of transition systems, thus being more similar to a small-step approach. In our 
understanding, the coalgebra models a single computation step with possible 
effects, and from this it is possible to derive a unique morphism into the final 
coalgebra modelling the “whole” semantics. Our trace semantics, being big-step, 
seems to roughly correspond to directly get this whole semantics. In other words, 
we do not have a coalgebra structure on configurations. 


Proving soundness As we have discussed, also proving (type) soundness with 
respect to a big-step semantics is a challenging task, and some approaches have 
been proposed in the literature. In [24], to show soundness of large steps seman- 
tics, they prove a coverage lemma, which ensures that the rules cover all cases, 
including error situations. In [30] the authors prove a soundness property similar 
to Theorem 4, but by using a separate judgment to represent divergence, thus 
avoiding using traces. In [5] there is a proof of soundness of a coinductive type 
system with respect to a coinductive big-step semantics for a Java-like language, 
defining a relation between derivations in the type system and in the big-step 
semantics. In [8] there is a proof principle, used to show type soundness with 
respect to a big-step semantics defined by an inference system with corules [7]. 
In [4] the proof of type soundness of a calculus formalising path-dependent types 
relies on a big-step semantics, while in [3] soundness is shown for the polymor- 
phic type systems F<., and for the DOT calculus, using definitional interpreters 
to model the semantics. In both cases they extend the original semantics adding 
error and timeout, and adopt inductive proof strategies, as in [39]. A similar 
approach is followed by [32] to show type soundness of the Core ML language. 

Also [6] proposes an inductive proof of type soundness for the big-step se- 
mantics of a Java-like language, but relying on a notion of approximation of 
infinite derivation in the big-step semantics. 

Pretty big-step semantics [17] aims at providing an efficient representation 
of big-step semantics, so that it can be easily extended without duplication of 
meta-rules. In order to define and prove soundness, they propose a generic er- 
ror rule based on a progress judgment, whose definition can be easily derived 
manually from the set of evaluation rules. This is partly similar to our wrong 
extension, with two main differences. First, by factorising rules, they introduce 
intermediate steps as in small-step semantics, hence there are similar problems 
when intermediate steps are ill-typed (as in Sect. 5.2, Sect. 5.4). Second, wrong 
introduction is handled by the progress judgment, that is, at the level of side- 
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conditions. Moreover, in [13] there is a formalisation of the pretty-big-step rules 
for performing a generic reasoning on big-step semantics by using abstract inter- 
pretation. However, the authors say that they interpret rules inductively, hence 
non-terminating computations are not modelled. 

Finally, some (but not all) infinite trees of our trace semantics can be seen as 
cyclic proof trees, see end of Sect. 3.1. Proof systems supporting cyclic proofs can 
be found, e.g., in [14,15] for classical first order logic with inductive definitions. 


8 Conclusion and future work 


The most important contribution is a general approach for reasoning on sound- 
ness with respect to a big-step operational semantics. Conditions can be proven 
by a case analysis on the semantic (meta-)rules avoiding small-step-style inter- 
mediate configurations. This can be crucial since there are calculi where the 
property to be checked is not preserved by such intermediate configurations, 
whereas it holds for the final result, as illustrated in Sect. 5. 

In future work, we plan to use the meta-theory in Sect. 2 as basis to investi- 
gate yet other constructions, notably the approach relying on corules [8,9], and 
that, adding a counter, based on timeout [32,3]. 

We also plan to compare our proof technique for proving soundness with the 
standard one for small-step semantics: if a predicate satisfies progress and subject 
reduction with respect to a small-step semantics, does it satisfy our soundness 
conditions with respect to an equivalent big-step semantics? To formally prove 
such a statement, the first step will be to express equivalence between small-step 
and big-step semantics. On the other hand, the converse does not hold, as shown 
by the examples in Sect. 5.2 and Sect. 5.4. 

For what concerns significant applications, we plan to use the approach to 
prove soundness for the A-calculus with full reduction and intersection/union 
types [10]. The interest of this example lies in the failure of the subject reduction, 
as discussed in Sect. 5.4. In another direction, we want to enhance MINIFJ&O 
with \-abstractions and allowing everywhere intersection and union types [23]. 
This will extend typability of shared expressions. We plan to apply our approach 
to the big-step semantics of the statically typed virtual classes calculus developed 
in [24], discussing also the non terminating computations not considered there. 

With regard to proofs, that are mainly omitted here, and can be found in 
the extended version at http://arxiv.org/abs/2002.08738, we plan to investigate 
if we can simplify them by means of enhanced conductive techniques. 

As a proof-of-concept, we provided a mechanisation® in Agda of Lemma 1. 
The mechanisations of the other proofs is similar. However, as future work, we 
think it would be more interesting to provide a software for writing big-step 
definitions and for checking that the soundness conditions hold. 


Acknowledgments The authors are grateful to the referees: the paper strongly 
improved thanks to their useful suggestions and remarks. 


6 Available at https://github.com/fdgn/soundness-big-step-semantics. 
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Abstract. Abstract garbage collection and the use of pushdown systems 
each enhance the precision of control-flow analysis (CFA). However, their 
respective needs conflict: abstract garbage collection requires the stack 
but pushdown systems obscure it. Though several existing techniques 
address this conflict, none take full advantage of the underlying interplay. 
In this paper, we dissolve this conflict with a technique which exploits 
the precision of pushdown systems to decompose the heap across the 
continuation. This technique liberates abstract garbage collection from 
the stack, increasing its effectiveness and the compositionality of its host 
analysis. We generalize our approach to apply compositional treatment to 
abstract timestamps which induces the context abstraction of m-CFA, an 
abstraction more precise than k-CFA’s for many common programming 
patterns. 


Keywords: Control-Flow Analysis - Abstract Garbage Collection - Push- 
down Systems 


1 Introduction 


Among the many enhancements available to improve the precision of control-flow 
analysis (CFA), abstract garbage collection and pushdown models of control flow 
stand out as particularly effective ones. But their combination is non-trivial. 

Abstract garbage collection (GC) [10] is the result of applying standard GC— 
which calculates the heap data reachable from a root set derived from a given 
environment and continuation—to an abstract semantics. Though it operates in 
the same way as concrete GC, abstract GC has a different effect on the semantics 
to which it’s applied. Concrete GC is semantically irrelevant in that it has no 
effect on a program’s observable behavior.? Abstract GC, on the other hand, 
is semantically relevant in that, by eliminating some merging in the abstract 
heap, it prevents a utilizing CFA from conflating some distinct heap data. In the 
setting of a higher-order language, where data can represent control, this superior 
approximation of data translates to a superior approximation of control as well, 
manifest by the CFA exploring fewer infeasible execution paths. 

Pushdown models of control flow [16,3] encode the call-return relation of a 
program’s flow of execution as precisely as an unbounded control stack would 


3 It is irrelevant only if space consumption is unobservable, as is typical. 


© The Author(s) 2020 
P. Müller (Ed.): ESOP 2020, LNCS 12075, pp. 197—223, 2020. 
https: //doi.org/10.1007/978-3-030-44914-8_8 


198  K. Germane and M. D. Adams 


allow. Consequently, and in contrast to the finite-state models which preceded 
them, pushdown models enable a utilizing CFA—a stack-precise CFA—to avoid 
relating a given return to any but its originating call. Thus, pushdown models 
also induce CFAs which explore fewer infeasible execution paths. 

Not only do abstract GC and pushdown systems each enhance the control 
precision of CFA, they also appear to do so in complementary ways. Is it possible 
for a CFA to use both and gain the benefits of each? This question’s answer is 
not immediate, as these techniques have competing requirements: abstract GC 
must examine the stack to extract the root set of reachability but the use of 
pushdown models obscures the control stack to the abstract semantics. 

This question has been addressed by two techniques: The first introspec- 
tive technique [4] introduces a primitive operation into the analyzing machine 
which introspects the stack and delivers the set of frames which may be live; 
this technique has a variety of alternative formulations, some of which alter its 
complexity—precision profile [8,7]. The second technique [1], which modifies the 
first to work with definitional interpreters, dictates that the analyzer implement 
a set-passing style abstract semantics where each passed set contains the heap 
addresses present in the continuation at that point. Each of these techniques 
reconciles the competing requirements of abstract GC and pushdown models 
of control flow and allows the utilizing CFA to enjoy the precision-enhancing 
benefits of both at once. 

However, each of these techniques—hereafter referred to collectively as push- 
down GC—yields a setting in which abstract GC and pushdown models of con- 
trol flow merely coexist. In contrast, this paper prescribes a technique which 
exploits the pushdown model of control flow to enable a new mode of garbage 
collection—compositional garbage collection—which does not require the ability 
to inspect the continuation. 

The key observation is that, in a stack-precise CFA, the heap present at the 
point of a call is in scope at the point of its return. Thus, the analysis can offload 
some of the contents of the callee’s heap to the caller’s—in particular, the data 
irrelevant to the callee’s execution. When this offoading is performed, the final 
heap of the callee (just as it returns) is incomplete with respect to subsequent 
execution. But, since the caller’s heap is in scope at this point, the analysis can 
reconstitute the subsequent heap by combining the caller’s heap with the callee’s 
final heap. 

The data relevant to the callee’s execution is the data reachable from its 
local environment and excludes the data reachable from its continuation alone. 
Offloading heap data, then, consists of GC-ing each callee’s heap with respect 
to its local environment only. When one applies this practice consistently to all 
calls, one associates with each active call not a heap but a heap fragment, effec- 
tively decomposing the heap across the continuation. As we will show, careful 
separation and combination of these heap fragments can perfectly simulate the 
presence of the full heap. 

This liberation of GC from the continuation has several consequences for the 
host CFA. 
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1. It simplifies both the formalization and implementation of the host CFA, 
since it can omit the relatively complex machinery to ensure the continuation- 
resident addresses are at hand. 

2. It reduces the host CFA’s workload by not requiring it to traverse full heaps. 
Earl et al. [4] observe that traversal of large heaps observably increases anal- 
ysis time. 

3. It recovers context irrelevance in the host CFA’s semantics, a property we 
discuss more in Section 3.4 and Section 6.1. 

4. It enables purely-local execution summaries which makes memoization much 
more effective. 


In sum, relative to pushdown GC, compositional GC offers quantitative benefits 
to the host CFA, being strictly more powerful, as well as qualitative. 


1.1 Examples 


Let’s look at an example where compositional GC makes memoization more 
effective. Consider the following Scheme program 


(let* ([id (lambda (x) x)] 
[y (id 42)] 
[z (id y)]) 
(+ y z)) 


which calls id twice, each time on 42. 

We would hope that a CFA would be able to memoize its analysis of the 
first call and, upon recognizing that the second call is semantically-identical, re- 
use its results. However, contemporary CFAs will not because each call is made 
with a different heap—the second call’s heap includes a binding for y that the 
first’s doesn’t. Moreover, this distinction persists even with pushdown GC since 
y’s binding is needed to continue execution after the call. Since CFAs have no 
means but reachability to determine what is relevant to a given execution point, 
and since what is relevant constitutes a memoization key, pushdown GC is too 
weak to identify these two calls. 

In contrast, a CFA with compositional GC produces a heap fragment for 
each call which is closed over only data reachable from the local environment— 
for a call, the procedure and argument values themselves. Accordingly, from its 
perspective, these two calls are identical and specify a single memoization key. 

Now let’s look at an example where compositional GC keeps co-live bindings 
of the same variable distinct. Consider the following Scheme program 


(letrec ([f (lambda (x) 
(if (prime? x) 
(let (Ly (£ (+ x 1))]) 
(+ x y)) 
x))]) 
(£ 2)) 
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which defines and calls a recursive procedure f. 

Concrete evaluation of this program proceeds first calls £ with 2, and then 3, 
and then 4, returning 4, and then 3+ 4 = 7 and then 2+ 7 = 9. The procedure 
f is properly recursive—so these calls are nested—and, after f is called with 
4 but before it returns, three distinct bindings of x are live. Moreover, since 
each binding of x is needed until its binding call returns, each is continuation- 
reachable and therefore not claimed by GC. These facts and limitations translate 
to the analysis setting: a CFA will discover multiple co-live bindings of x which 
persist in the face of pushdown GC. Consequently, even with pushdown GC, a 
CFA will in general join these bindings to some degree, concluding that x can 
be 2 whenever it can be 3 and can be 3 whenever it can be 4. 

In constrast, just before a CFA with compositional GC performs each call 
to f, it GCs with respect to the operator and argument values which, in each 
case, consist of the closure of f (which reaches only itself in the heap) and a 
number (which doesn’t reach anything). Thus, each binding to x is the first in its 
respective heap fragment and doesn’t interfere with the live bindings of x in other 
heap fragments. Using a numeric abstraction in which arithmetic operations 
propagate but do not introduce approximation [1], a CFA with compositional 
GC will produce an exact answer (whereas one with pushdown GC will not). 


1.2 Generalizing the Approach 


The conventional treatment of the heap by CFA is to thread it through execution, 
allowing it to evolve as it goes. In contrast, compositional GC advocates that the 
CFA treat the heap with the same discipline that it treats the environment: saved 
at the evaluation of a subexpression and restored when its evaluation completes 
and its value is delivered. That is, compositional GC is achieved by, in effect, 
treating the heap compositionally. 

What happens if we impose the same compositional discipline on other 
threaded components, such as the timestamp? In that case, we move from the 
last-k-call-sites* context abstraction of k-CFA [14] to the top-m-stack-frames? 
context abstraction of m-CFA [11] This appearance of m-CFA’s abstraction in 
a stack-precise CFA is the first such, to our knowledge. 

With compositional treatment of both the heap and timestamp, we arrive 
at a stack-precise CFA which treats each of its components compositionally. 
This treatment also leads to a CFA closer to being compositional in the sense 
that the analysis of a compound expression is a function of the analyses of 
its constituent parts. Accordingly, we refer to such a stack-precise CFA as a 
compositional control-flow analysis. 

The remainder of the paper is as follows. We first introduce the syntax of the 
language we will use throughout the paper in Section 2. We then discuss the en- 
hancements of perfect stack precision, garbage collection, and their combination 
in Section 3. We then proceed through a series of semantics which transition 


* as in, most-recent k call sites 
5 as in, youngest m stack frames 
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from a threaded heap to a compositional, garbage-collected heap in Section 4. 
We then abstract the compositional semantics to obtain our CFA in Section 5. 
We discuss the ramifications of the compositional treatment of each of the heap 
and abstract time in Section 6. We finally discuss related work in Section 7 and 
conclusions and future work in Section 8. 


Note In the remainder of the paper, we use the standard term store to refer 
to the analysis component which models the heap. Thus, we will describe our 
technique as, e.g., treating stores compositionally. 


2 A-Normal Form A-Calculus 


For presentation, we keep the language small: we use a unary -calculus in A- 
normal form [5], the grammar of which is given below. 


Exp > e ::= ce | let x = ceine 
CExp > ce ::= ae | (aeo ae) | set! x ae 
AEzp > ae ::= «| Ax.e 


Var >a [an infinite set of variables] 


A proper expression e is a call expression ce or a let-expression, which binds 
a variable to the result of a call expression. (Restricting the bound expression 
to a call expression prevents let-expressions from nesting there, a hallmark of 
A-normal form.) A call expression ce is an atomic expression ae, an application, 
or a set!-expression. An atomic expression ae is a variable reference or a A 
abstraction. 

Atomic expressions are trivial [13]. We include set!-expressions to produce 
mutative effects that must be threaded through evaluation. (The approach we 
present in this paper can also handle more-general forms of mutation, such as 
boxes.) For our purposes, we consider a set!-expression “serious” [13] since it has 
an effect on the store. 

A program is a closed expression; we assume (without loss of generality) that 
programs are alphatised—that is, that each bound variable has a distinct name. 

Expressions of the form (aeo ae1) for some aeo and ae, constitute the set 
App; similarly, expressions of the form Ax.e for some x and e constitute the set 
Lam. 


3 Background 


In this section, we review abstract garbage collection and the k-CFA context 
abstraction. We begin by introducing a small-step concrete semantics which 
defines the ground truth of evaluation. 
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3.1 Semantic Domains 


First, we introduce some semantic components that we will use heavily through- 
out the rest of the paper. 


v € Val = Lam x Env p E€ Env = Var — Time 
t € Time = App* a € Address = Var x Time 
a € Store = Address — Val k E Cont ::= mt |It(a, p,e, K) 


A value v is closure, a pair of a \ abstraction and an environment which closes 
it. An environment p is a finite map from each variable x to a time t; a time t 
is a finite sequence of call sites. Let p|e denote the restriction of the domain of 
the environment p to the free variables of e. An address a is a pair of a variable 
and time and a store ø is a map from addresses to values. A continuation « is 
either the empty continuation or the continuation of a let binding. 


3.2 Concrete Semantics 


We define our concrete semantics as a small-step relation over abstract machine 
states. The state space of our machine is given formally as follows. 


S € State =Eval + Apply 
Sey E Eval =Exp x Env x Store x Cont x Time 
Sap E€ Apply = Val x Store x Cont x Time 


Machine states come in two variants. An Eval machine state represents a point 
in execution in which an expression will be evaluated; it contains registers for 
an expression e, its closing environment p, the store ø (modelling the heap), the 
continuation « (modelling the stack), and the time t. An Apply machine state 
represents a point in execution at which a value is in hand and must be delivered 
to the continuation; it contains registers for the value v to deliver, the store ø, 
the continuation «, and the time t. 

Figure 1 contains the definitions of two relations over machine states, the 
union of which constitutes the small-step relation. The —ev relation transitions 
an Eval state to its successor. The LET rule pushes a continuation frame to save 
the bound variable, environment, and body expression. The resultant Eval state 
is poised to evaluate the bound expression ce. The CALL rule first uses aeval 
defined 


aeval(a, p, x) = o (x, p(x)) aeval(a, p, Av.e) = (Ax.e, p|Ax.e) 


to obtain values for each of the operator and argument. It then increments 
the time, extends the store and environment with the incremented time, and 
arranges evaluation of the operator body at the incremented time. The SET! rule 
remaps a location in the store designated by a given variable (which is resolved in 
the environment) to a value obtained by aeval. It returns the identity function. 
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LET 


ev(let x = ce in e, p, 0, K, t) ey ev(ce, p, a, It(x, p, e, K), t) 


CALL 
(Az.e, p') = aeval(a, p, aeo) v = aeval(a, p, ae1) t = (aeo ae1) :: t 
o= olt) o= oe t] 
ev((aeo ae1), pP,9,K, t) ev ev(e, p", a’, K, t') 
SET! 
v = aeval(c, p, ae) a = (x, p(x)) ao = cja vl 
ev(set! x ae, p, o, K, t) ey ap((Ax.x, L), 0’, K, t) 
ATOMIC APPLY 
v = aeval(o, p, ae) P = plz = t] a’ = o|(x,t) = v] 
ev(ae, P:9, f, t) ev ap(v, 0, R, t) ap(v, T, It(x, P, €, K), t) ap ev(e, p, a, K, t) 


Fig. 1. Small-step abstract machine semantics 


The ATOMIC rule evaluates an atomic expression. The APPLY rule applies a 
continuation to a value, extending the environment and store and arranging for 
the evaluation of the let body. 

We inject a program pr into the initial evaluation state ev(pr, L, 1, mt, ()) 
which arranges evaluation in the empty environment, empty store, halt contin- 
uation, and empty time. 


Adding Garbage Collection At this point, we have a small-step relation 
defining execution by abstract machine and are perfectly positioned to apply, 
e.g., the Abstracting Abstract Machines (AAM) [15] recipe to abstract the se- 
mantics and thereby obtain a sound, computable CFA. Before doing so, however, 
we will extend our semantics to garbage-collect the store on each transition. This 
extension has no semantic effect in the concrete semantics but, as we will discuss, 
greatly increases the precision of the abstracted (or, simply, abstract) semantics. 

We extend the semantics by defining two garbage collection transitions, one 
which collects an Eval state and one which collects an Apply state. Because our 
abstract machine explicitly models local environments, heaps (via stores), and 
stacks (via continuations), we can apply a copying collector to perform garbage 
collection. 

First, we define a family root of metafunctions to extract the reachability 
root set from values, environments, and continuations. 


root, (Aw.e, p) = root, (p) root,,(mt) = Ø 
root,(p) = p root,,(It(x, p,e,K)) = root (ple) U root, («) 


The root, metafunction extracts the root addresses from a closure by using root, 
to extract the root addresses from its environment. By the root, metafunction, 
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the root addresses of an environment are simply the variable-time pairs that 
define it—that is, the definition of root, views its argument p extensionally as 
a set of addresses. The root, metafunction extracts the root addresses from a 
continuation. The empty continuation has no root addresses whereas the root 
addresses of a non-empty continuation are those of its stored environment (re- 
stricted to the free variables of the expression it closes) combined with those of 
the continuation it extends. 

Next, we define a reachability relation +, parameterized by a store o and 
over addresses by 

ao >c G1 & a1 E root,(a(ao)) 


We then define the reachability of a root set with respect to a store 
R(o, A) = {a :a € A, a >> a'} 


where —* is the reflexive, transitive closure of +,. From here, we obtain the 


transitions 
GC-EVAL 


A = root,(ple) U root, (K) a’ =0|R,(c,A) 


ev(e, p, o, K, t) cc ev(e, p, 0", K, t) 


GC-APPLY 
A = root,(v) U root, («) a’ = O|R(0,4) 


ap(v,o,k,t) 6c ap(v, o’, K, t) 
where o|p(¢,A) is o restricted to the reachable addresses R(a, A). We compose 


this garbage-collecting transition with each of >,, and —,p. Altogether, the 
garbage-collecting semantics are given by ac o[—ev U ap]. 


3.3 Abstracting Abstract Machines with Garbage Collection 


Now that we have a small-step abstract machine semantics with GC, we are 
ready to apply the AAM recipe to obtain a sound, computable CFA with GC. 

We apply the AAM recipe in two steps. 

First, we refactor the state space so that all inductively-defined components 
are redirected through the store. Practically, this refactoring has the effect of 
allocating continuations in the store. For our semantics, this refactoring yields 
the state space States, defined 


States, =Evalsa + Applysa 
Evalg, =Exp x Env x Stores, x ContAddr x Time 
Applysa =Storesa x ContAddr x Val x Time 


in which a continuation address a € ContAddr replaces the continuation drawn 
from Cont. The space of continuations becomes defined by 


kga E€ Contsa ::= mt|It(x, p,e, a) 
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and of stores by 
Stores, = Address + ContAddr — Val + Contg, 


Not reflected in this structure is the typical constraint that an address a will 
only ever locate a value and a continuation address œ will only ever locate a 
continuation. 

Second, we finitely partition the unbounded address space of the store and 
treat the constituent sets as abstract addresses (via some finite representative). 
Practically, this partitioning is achieved by limiting the time t to at most k call 
sites where k becomes a parameter of the CFA (leading to the designation k- 
CFA). Any addresses which agree on the k-length prefix of their time component 
are identified and the finite representative for this set of addresses uses simply 
that prefix. Accordingly, we define an abstract time domain Time = TimeS* 
and let it reverberate through the state space definitions, obtaining 


State =Eval + Apply 
Eval =Exp x Env x Store x ContAddr x Time 


Apply =Store x ContAddr x Val x Time 


(in which we allow the definition of ContAddr to depend, directly or not, on that 
of Time). 

Finitization of the address space is key to producing a computable CFA. 
Practically, however, it means that some values located previously by distinct 
addresses will after be located by the same abstract address. When this conflation 
occurs, the CFA must behave as if either access was intended; this behavior is 
manifested by non-deterministically choosing the value located by a particular 
address. Because our language is higher-order, this non-determinism also affects 
the control flows the CFA considers. This effect is evident in the CALL rule 
defined 


CALL on 
(Ax.e, fp’) € aeval(ô 


, Ê, aeo) ô = aeval(<, A, ae1) P = | (aeo ae1) :: Elk 
Palae aa 


ev((aeg ae1), p, ĉ, â, Ê) ey ev(e, p”, 6’, â, f) 


which is structurally identical to that of the concrete semantics except in two 
respects: 


1. The abstract evaluation of the operator aeọ may yield multiple closures and 
the CFA considers the application of each. Due to the approximation finitiza- 
tion introduces, not every abstractly-applied closure will necessarily appear 
in a compatible call under the concrete semantics. Such closures, initiating 
spurious control paths, waste analysis effort and this waste compounds as 
the exploration of spurious paths leads to the discovery of yet more. 

2. The abstract time component is limited to length at most k (obtained by 


LJe). 
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In short, a finite address space introduces a value approximation and, in a higher- 
order language such as ours, a control approximation as well. 

While the strategy to store-allocate continuations facilitates the systematic 
abstraction process of AAM, it also imposes a similar approximation on the 
continuation space as it does the value space. In consequence, a CFA obtained 
by AAM approximates not only the value and control flow of the program, 
but the return flow as well. Return-flow approximation is manifest as a single 
abstract call returning to caller contexts that did not make that call. 

On the other hand, because the AAM abstraction process preserves the over- 
all structure of the state space—in particular, the explicit models of the local 
environment, heap, and stack—applying GC to an abstract state is straightfor- 
ward. In addition, GC in the abstract semantics improves precision and reduces 
the workload of the analyzer [10]. 

To see how GC improves precision, consider a OCFA (that is, [k = 1]CFA) 
without GC of the Scheme program 


(let* ([id (lambda (x) x)] 
[y (id 42)] 
[z (id 35)]) 
z) 


at the call (id 42). As the abstract call is made, the abstract value 42 is stored 
an address a derived from x. Once the call returns, the abstract value 42 still 
resides in the heap at a which is now unreachable. However, as the abstract call 
(id 35) is made, the address a is derived again (a consequence of the finite 
address space), and the abstract value 35 is merged with the abstract value 42 
which persists at a. Since the value at a is returned and becomes the result of 
the program, the CFA reports that the program results in either 42 or 35. 

Now consider a OCFA with GC of the same program. Once the call (id 
42) returns and a becomes unreachable, its heap entry is reaped by GC. The 
abstract call (id 35) then allocates the abstract value 35 at a which is, from 
the allocator’s perspective, a fresh heap cell. Consequently, the CFA precisely 
reports that the program results in 35. 

The above example also illustrates how GC reduces the workload of the an- 
alyzer. Though we didn’t call it out, when using a naive continuation allocator 
without GC, the abstract call (id 35) not only correctly returns to the contin- 
uation binding z but also spuriously returns to the continuation binding y. In 
this example, this spurious control (return) flow does no more damage to the 
precision of OCFA’s approximation of the final program result, but does cause 
it to explore infeasible control flows which damage the precision of the OCFA’s 
approximation of intermediate values. GC prevents the spurious flows in this 
example from arising at all; however, in general, it does not prevent all spurious 
return flows. 


6 P4F [6] uses a particular continuation allocator which is able to avoid return-flow 
approximation. However, the P4F technique applies only when the store is globally- 
widened and, in such a setting, no data ever becomes unreachable which renders GC 
completely ineffective. 
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3.4 Stack-Precise CFA with Garbage Collection 


In contrast to an AAM-derived analysis, a stack-precise CFA does not approx- 
imate the return flow of the program. A stack-precise CFA achieves this feat 
by modelling control flow with a pushdown system which allows it to precisely 
match returns with their corresponding calls. However, to do so, it requires full 
control of the continuation which we abide by factoring it out of the state space, 
obtaining 


Statepp =Evalpp + Apply pp 
Evalpp =Exp x Env x Store x Time 


Apply pp =Val x Store x Time 


before we abstract it to produce a CFA. (Some CFAs factor the store out of 
machine states to be managed globally, part of widening the store. In a sense, 
factoring out the continuation is part of widening the continuation.) Without 
a continuation component, an Evalpp state is an evaluation configuration and 
an Apply pp state is an evaluation result. Except for the presence of the time 
component, Statepp exhibits precisely the configuration and result shapes one 
finds in many stack-precise CFAs [17, 8, 1, 18]. 

However, factoring the continuation out and ceding control of it to the anal- 
ysis presents an obstacle to abstract GC, which needs to extract the root set of 
reachable addresses from it. Earl et al. [4] developed a technique whereby the 
analysis could introspect the continuation and extract the root set of reachable 
addresses from the continuation. Johnson and Van Horn [8] reformulated this 
incomplete technique for an operational setting and offered a complete—albeit 
theoretically more-expensive—technique capable of more precision. Johnson et 
al. [7] unified these techniques within an expanded framework. Darais et al. [1] 
then showed that the Abstracting Definitional Interpreters-approach—currently 
the state of the art—is compatible with the complete technique by including the 
set of stack root addresses as a component in the evaluation configuration. 


Context Irrelevance These techniques indeed reconcile the conflicting needs 
of GC and stack-precise control yielding an analysis which enjoys the precision- 
enhancing benefits of each. However, the addition of garbage collection causes 
the resultant analysis to violate contezt irrelevance [8], the property that the 
evaluation of a configuration is independent of its continuation. In terms of 
the concrete semantics of Section 3.2, context irrelevance is the property that 
ev(e, p,0,k,t) >+ ap(o’,«,v) if and only if ev(e,p,a,4’,t) >" ap(o’,n’,v) for 
any k and x’. 

The incomplete and complete techniques to achieve stack-precise abstract GC 
each violate context irrelevance. Under the incomplete technique, abstract GC 
prevents spurious paths from being explored and changes the store yielded by 
those that are explored. Thus, the abstract evaluation of a configuration becomes 
dependent on (the root set of reachable addresses embedded in) its continuation. 
The complete technique, achieved by introducing the set of root addresses as a 
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component in the evaluation configuration, vacuously restores context irrelevance 
by distinguishing otherwise-identical configurations based on the continuation. 
That is, the states ev(e, p, ø, K, t) and ev(e, p, g, K’, t) with identical configurations 
but distinct continuations become the continuation-less evaluation configurations 
ev(e, p, o, A, t) and ev(e, p, o, A’, t) with distinct root address sets A and A’. This 
address set is a close approximation of the continuation and effectively makes 
the control context relevant to evaluation. 


3.5 The k-CFA Context Abstraction 


In the concrete semantics, the time component t serves two purposes. The first 
purpose is to provide the allocator with a source of freshness, so that when the 
allocator must furnish a heap cell for a variable bound previously in execution, it 
is able to furnish a distinct one. Were freshness the only constraint on t, the Time 
domain could simply consist of N. In anticipation of its role in the downstream 
CFA, the time component assumes a second purpose which is to capture some 
notion of the context in which execution is occurring. The hope is that the notion 
of context it captures is semantically meaningful so that, when an unbounded 
set of times are identified by the process of abstraction, each address, which 
is qualified by such an abstracted time, locates a semantically-coherent set of 
values. 

To get a better idea of what notion of context our treatment of time cap- 
tures, let’s examine how our concrete semantics treats time, as dictated by k- 
CFA. Time begins as the empty sequence (). It is passed unchanged across all 
Eval transitions, save one, and the Apply transition. The exception is the CALL 
transition, which instead passes the (at-most-)k-length prefix of the application 
prepended to the incoming time. Hence, the k-CFA context abstraction is the 
k-most-recent calls made in execution history. 

In Section 6.2, we consider the ramifications of threading the time component 
through evaluation and compare it to an alternative treatment. 


4 From Threaded to Compositional Stores 


In this section, we present a series of four semantics that gradually transition 
from a threaded treatment of stores without GC to a compositional treatment of 
stores with GC. We define each of these semantics in terms of big-step judgments 
of (or close to) the form o,p,t F e J) (v,o’). This judgment expresses that 
the evaluation configuration consisting of the expression e under the store o, 
environment p, and timestamp t evaluates to the evaluation result consisting of 
the value v and the store o’. When discussing the evaluation of e, we will refer 
to ø as the incoming store and o’ as the resultant store. We will also refer to 
the time component t as the binding context since, in the big-step semantics, its 
connection to the history of execution becomes more distant. 

Formulating our semantics in big-step style offers two advantages to our set- 
ting: First, we can readily express them by big-step definitional interpreters at 
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which point we can apply systematic abstraction techniques [1,18] to obtain 
corresponding CFAs exhibiting perfect stack precision. Second, they emphasize 
the availability of the configuration store at the delivery point of the evalua- 
tion result; this availability is crucial to our ability to shift to a compositional 
treatment of the store. 


4.1 Threaded-Store Semantics 


To orient ourselves to the big-step setting, we present the reference semantics for 
our language in big-step style in Figure 2. This reference semantics is equivalent 
to the reference semantics given in small-step style in Section 3.2 except that 
there is no corresponding APPLY rule; its responsibility—to deliver a value to 
a continuation—is handled implicitly by the big-step formulation. In terms of 
big-step semantics, this reference semantics is characterized by the threading of 
the store through each rule; the resultant store of evaluation is the configuration 
store plus the allocation and mutation incurred during evaluation. Hence, we 
refer to this semantics as the threaded-store semantics. We use natural numbers 
as store subscripts in each rule to emphasize the store’s monotonic increase. 


LET 
o0, p,t F ce J} (vo, 01) 
p=plzt] o =a|(z, t) vo]  o2,p,tH et (v, os) 
o0, p,t H letx = ceine J) (v, 03) 
CALL 
((Az.e, po), 01) = aeval(ao, p, aeo) 
(v1, 02) = aeval(oi, p, ae1) t = (aeo ae1) :: t 
p= plet] o3=oal(a,t’) u]  os,put ed (0,04) 
a0, p,t F (aeo ae1) J (v, o4) 
SET! A 
(v, o1) = aeval(o0, p, ae) oi = o0|(x, p(x)) => v] TOMIG 
o0, p, t F set! x ae |) ((Ax.x, L), o1) o, p,t F ae |) aeval(o, p, ae) 


Fig. 2. The threaded-store semantics 


A program pr is evaluated in an initial configuration with an empty store L, 
an empty environment L, and an empty binding context (). In such a configu- 
ration, pr evaluates to a value v if L, L, 0 F pr 4 (v, o). 

The LET rule evaluates the bound call expression ce under the incoming 
environment and store. If evaluation results in a value-store pair, this incoming 
environment is extended with a binding derived from the bound variable and 
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incoming binding context.” The resultant store is extended with mapping from 
that binding to the resultant value. The body expression is evaluated under 
the extended environment and store and its result becomes that of the overall 
expression. 

Contrasting the treatment of the environment and the store by the LET rule 
is instructive. On the one hand, the environment is treated compositionally: the 
incoming environment of evaluation is restored and extended after evaluation of 
the bound value. On the other hand, the store is treated non-compositionally: 
the store resulting from the evaluation of the bound expression is extended after 
it has accumulated the effects of its evaluation. 

Under this criteria, we classify the treatment of the binding context as compo- 
sitional rather than threaded. This compositional treatment departs from typical 
practice of CFA and is the first such treatment in a stack-precise CFA to our 
knowledge. In Section 6.2, we examine the ramifications of this treatment. 

The CALL rule evaluates the atomic expressions aeg and ae, for the operator 
and argument, respectively. It then derives a new binding context, extends the 
environment and store with a binding using that context, and evaluates the oper- 
ator body under the extended environment, store, and derived binding context. 
The result of evaluation the body is that of the overall expression. 

The SET! rule evaluates the atomic body expression ae and updates the 
binding of the referenced variable in the store. Its result is the identity function 
paired with the updated store. 

The ATOMIC rule evaluates an atomic expression ae using the aeval atomic 
evaluation metafunction. Foreshadowing the succeeding semantics, we define 
aeval to return a pair of its calculated value and the given store. In this seman- 
tics, the store is passed through unmodified; in forthcoming semantics, it will be 
altered according to the calculated value. Atomic evaluation is unchanged from 
the small-step semantics: 


aeval(a, p, x) = (o (x, p(x)), o) aeval(a, p, Az.e) = ((Ax.e, p|\x.e), 7) 


4.2 Threaded-Store Semantics with Effect Log 


The second semantics enhances the reference semantics with an effect log € which 
explicitly records the allocation and mutation that occurs through evaluation. 
The effect log is considered part of the evaluation result; accordingly the effect log 
semantics are in terms of judgments of the form øg, p,t F e |) (v, o”), €. Figure 3 
presents the effect log semantics, identical to the reference semantics except for 
(1) the addition of the effect log and (2) the use of the metavariable a to denote 
an address (x,t). (This usage persists in all subsequent semantics as well.) 

The effect log is represented by a function from store to store. The definition 
of each log is given by either a literal identity function, a use of the extend jog 


T Because the program is alphatised, the binding of a let-bound variable in a particular 
calling context will not interfere with the binding of any other variable. 
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LET 
o0, p,t F ce dh (vo, 01), £o 
p =pl >t] o=ail(z,t) >v]  o2,9',th elh (v, 03), & 
oo, p, t H letz = ceine Ji (v, 03), €1 o extendiog((x, t), vo, 01) © £o 


CALL 
((Ax.e, po), 01) = aeval(ao, p, aeo) 
(v1, 02) = aeval(o1, p, ae1) t = (aeo ae1) :: t 
pı = polz t] os =o|(2,t) => v]  o3,p1,t F elh (v, 04), £ 


co, p,t H (aeo ae1) 1 (v, 04), € o extendiog ((£, t), v1, 02) 


SET! 
(v, o1) = aeval(ao, p, ae) 
a=(#,9(2)) oi = ola > v] 


To, p,t H set! x ae Jhi ((Ax.x, L), 01), extendicg(a,v, 01) 


ATOMIC 


o, p,t ae |r aeval(o, p, ae), Ao. 


Fig. 3. Threaded-store semantics with an effect log 


metafunction, or the composition of effect logs. The extend;,g metafunction is 
defined 


extendiog (a, v, o") = àc.o [a > v] U o” 


where the union of the extended store ola +> v] and the value-associated store 
o’ treats each store extensionally as a set of pairs but the result is always a 
function—i.e. any given address is paired with at most one value. The effect 
log of the ATOMIC rule is the identity function, reflecting that no allocation or 
mutation is performed when evaluating an atomic expression. The effect log of 
the SET! rule is constructed by the metafunction extend),,; the store argument 
to extendjog is the store after the mutation has occurred. The use of this store is 
necessary to propagate the mutative effect and ensures that its union with the 
store on which this log is replayed agrees on all common bindings. The effect log 
of the CALL rule is composed of the effect log of evaluation of the body and an 
entry for the allocation of the bound variable. Finally, the effect log of the LET 
rule is composed of the effect logs of evaluation of both the body and binding 
expression interposed by an entry for the allocation of the bound variable. 

In this semantics (and the next), the bindings in o’ are redundant: once 
extendjog applies the the mutative or allocative binding to its argument o, o 
already contains all the bindings of o’. Once we introduce GC to the semantics, 
however, this will no longer be the case. 

The intended role of the effect log is captured by the following lemma, which 
states that one may obtain the resultant store by applying the resultant log to 
the initial store of evaluation. 
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Lemma 1. Ifo, p,tt e} (v, 0’), €, then o = &(c). 


The proof proceeds straightforwardly by induction on the judgment’s derivation. 


4.3 Compositional-Store Semantics 


The third semantics (seen in Figure 4) shifts the previous semantics from thread- 
ing the store to treating it compositionally. Under this treatment, evaluation 
results still consist of a value, store, and effect log, but the store is associated 
directly to the value—at least conceptually—and not treated as a global effect 
repository. This alternative role is particularly apparent in the LET rule: the 
store resulting from evaluation of the bound expression is not extended to be 
used as the initial store of evaluation of the body. Instead, the effect log resulting 
from evaluation of the bound expression is applied to the initial store (of the 
overall let expression). We emphasize this compositional treatment by no longer 
using numeric subscripts, which suggest “evolution” of the store, and instead 
using ticks, which suggest distinct (but related) instances. 


LET 
a, p,thk ce Jo (v',0),€ o = &(o) 
(p',0”) = extend(p, 0’, £, t, V, oy’) o”, p',tH ello (v, ov), E 


a, p,t H leta = ceine Io (v, av), E o extendiog((x, t), V, oy) 0 € 


CALL 
((Ax.e, po), 70) = aeval(a, p, aeo) (v1, 01) = aeval(a, p, ae1) t = (aeo ae1) :: t 
(p', 0’) = extend (po, Co, x,t’, v1, 01) o'p t F ello (v, ov), € 


o, p,t H (aeo ae1) Io (V, ov), € o extendiog( (£, t), v1, 01) 


SET! 
(v, ov) = aeval(a, p, ae) 
a= (x ple) o= osla v] 


c, p,t F set! x ae Lo ((Ax.x, L), o"), extendiog (a, v, o") 


ATOMIC 


o, p,t F ae Jo aeval(o, p, ae), Ao. 


Fig. 4. The compositional-store semantics 


We use the extend metafunction to bind a value v (with an associated store 
dy) to a variable x in a given binding context t within a given environment p 
and store a, defined 


extend(p, 0, £, t, v, €») = (plx 4 t],a[(x,t) = v] U o») 
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When we extend o with a mapping for v, we also copy all of the mappings from 
dy. This copying will yield a well-formed store since a[(x,t) +> v] and oy agree 
on any common bindings. 

Although the role of the store has changed, the same lemma holds in this 
semantics as does in the previous. We repeat it in terms of this semantics. 


Lemma 2. If o,p,t + e dbo (v, ay), €, then Elo) = ov. 


Like the previous lemma, its proof can be obtained by induction on the 
judgment’s derivation. 


4.4 Compositional-Store Semantics with Garbage Collection 


Our final semantics (seen in Figure 5) continues the compositional treatment of 
the store but GCs stores to remove irrelevant bindings. Under this compositional 
treatment, the role of the store is to model the fragment of the heap which is 
reachable from an associated environment: the store of a configuration closes the 
associated environment and the store of a result closes the environment of the 
associated value. Accordingly, the root set of reachability used by GC includes 
the addresses of the closed environment only and, in particular, does not include 
addresses from the continuation. We define reachability just as we did for GC in 
Section 3.2, using the root, and root, metafunctions to extract a root set from 
a value and environment, respectively. 

In this semantics, we use a modified atomic evaluation function aeval,, which 
garbage-collects the store associated with a value. It is defined 


aevalg-(o, p, £) = (v, gc(v,o)) where v = o(2, p(x)) 


aeval gc (a, p, Ax.e) = (v, gc(v, 7)) where v = (Az.e, p|yz.c) 


where gc(v,o) prunes the unreachable bindings from ø with respect to v. 

This semantics is careful to ensure that each evaluation is performed under 
a store which contains no values unreachable from the environment via frequent 
use of the restrict metafunction. For a given expression e, closing environment 
p, and closing store ø, the restrict metafunction first determines the restriction 
ple of p to the free variables of e and then the bindings of ø reachable from pļe; 
it then garbage-collects the store by pruning unreachable bindings. Formally, 
restrict is defined 


restrict(e, p, o) = (ple, gc(ple, 7)) 


where gc(p, o) prunes the unreachable bindings from ø with respect to p. 

The LET rule proceeds by first obtaining the restriction of the environment 
and store with respect to the bound expression ce, before evaluating ce under 
that restriction. The evaluation of ce produces a value v’, an associated store oy 
which closes only that value, and an effect log €’. The LET rule then replays the 
effect log €’ on the initial store ø thereby accumulating any mutation (and allo- 
cation on which it depends) which occurred. After replaying the log, it extends 
the resultant store g’ and initial environment p with a binding for v’ and copies 
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LET 
(Pee, ce) = restrict(ce, p, o) 
Oce, Pce, t F ce Vege (v, ow), E a = ¿' (o) (P, o") = extend (p, a, x,t, V, op") 
(pe, Te) = restrict(e, PS a”) Oe, Pe, tre Luc (v, Tv), E 


o, p,t H letx = ceine gc (V, av), E o extendiog((x, t), V, oy’) 0 &! 


CALL 

((Ax.e, po), 70) = aevalgc (0, p, aeo) (vi, o1) = aeval,.(o, p, ae1) 
t = (aeo ae1) :: t (p', 0’) = extend (p0, a0, £, t, v1, 01) 
(Pe, Fe) = restrict(e, p', o”) Tapet H e Psc (v, ov), E 
T, p,t H (aeo ae1) Vac (v, Tv), O extendiog ((z, t’),v1, 01) 


SET! 
(v, ov) = aevalgc (0, p, ae) 
a=(2,0(2))  o'= ola o] 
a, p,t H set! x ae lge ((Ax.x, L), L), extendiog (a, v, 0’) 


ATOMIC 


o, p,t H ae gc aevalge(0, p, ae), Ao.0 


Fig. 5. The compositional-store semantics with garbage collection 


the bindings of its associated store ow. Finally, the extended environment and 
store are restricted with respect to the body expression e before e’s evaluation 
under them. 

The CALL rule proceeds by first evaluating the atomic operator and argument 
expressions. After calculating the new binding context t’, the operator value 
environment and store are extended with the new binding. Before evaluation of 
the body e commences, the extended environment and store are restricted with 
respect to it. 

The SET! rule atomically evaluates the expression ae producing the assigned 
value. It returns the identity function which, with an empty environment, is 
closed by an empty store. 

The ATOMIC rule evaluates an atomic expression with aeval ge. 

To connect this semantics to the previous, we show that the addition of GC 
has no semantic effect by the following lemma. 


Lemma 3. If o,p,t F e lo (v, 0v), and o' = gc(ple,a) then o',p,t F e Lge 
(v, ol), € where of, = gc(v, oy). 


In prose, this lemma states that two evaluation configurations, identical ex- 
cept that one’s store is the other’s with unreachable bindings pruned, will yield 
the same evaluation result: their evaluation will produce the same value and, 
modulo unreachable bindings, the same closing store. 
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5 Abstract Compositional-Store Semantics with Garbage 
Collection 


We now abstract the compositional-store semantics with GC—the final seman- 
tics of the preceding section. Abstracting the semantics involves (1) defining a 
finite counterpart of each component of the evaluation configuration and result 
and (2) defining a counterpart of each semantic rule in terms of these finite 
components. With each component of the configuration finite, configurations 
themselves become finite. Then we show that each abstracted rule simulates its 
counterpart—that it admits the full range of its counterpart’s behavior. Doing 
this for each rule ensures that the abstract semantics includes every behavior in- 
cluded by the exact semantics. Once that’s complete, we can directly implement 
our big-step semantics in an abstract definitional interpreter [1, 18] to obtain our 
stack-precise CFA with GC. 
We begin by abstracting each configuration component. 


ô € Val = P(Lam x Env) ô € Env = Var = Time 
te Time = App” â € Address = Var x Time 
& € Store = Address —> Val Ê € Log = Address + Val 


Like its concrete counterpart, an abstract store 6 maps an abstract address to 
an abstract value. Abstract addresses remain a pair of a variable and binding 
context, only the context is abstract. An abstract value 0, however, is a set of 
abstract closures rather than a single closure. An abstract closure is a À paired 
with an abstract environment p which itself is a finite map from variables to 
binding contexts. An abstract timestamp £ is a sequence of at most m application 
sites, where m is a parameter to the analysis. An abstract log Ê is an extensional 
account of the added and modified store mappings relative to the initial store, 
and takes the same form of an abstract store itself. We define abstract join, 
composition, and application operators by 


ĉo UG = Aâ.ĉo (â) U G1 (â) £88 = G UE &(6) =GUE 
To help show that the abstract semantics simulates the concrete, we make 
a connection between the state space of the abstract and that of the concrete. 


We make this connection by means of a polymorphic abstraction function | - |°, 
defined for all domains except stores by 


Ip] = Azle) t= [t}m — (Awe, p)| = {Awe lel} IE = IE) 


and for stores by 


lol =a. LJ lo(a)| 


ja|=a 


8 The parameter m is used similarly to the parameter k of k-CFA. 

° The abstraction function is typically accompanied by a complementary concretiza- 
tion function to complete a Galois connection. For simplicity here, we leave it in- 
complete. 
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Abstracting a store groups entries by their abstracted address in a large set. 
Abstracting an environment p abstracts its range. Abstracting a binding context 
t takes its at-most-m-length prefix. Abstracting a closure produces a singleton 
of that closure with an abstracted environment. Finally, abstracting a log € 
produces the abstract store that results from apply the log to the empty store 
L and then abstracting. 


Figure 6 defines the abstract compositional-store semantics with garbage 
collection. Structurally, nearly every rule is identical to the exact counterpart 
that it abstracts; most of the work of abstraction is defining the abstract domains 
and metafunctions and connecting them to those of the exact semantics. The 
CALL rule differs structurally from its exact counterpart in two notable ways: 
First, because an abstract value is a set of closures, it applies for each such 
closure in the operator set. Second, it defines the new binding context ¢’ to be 
the prefix of the application site prepended to the previous abstract time ¢ and 
limited to a length of at most m. The abstract aeval metafunction is defined 


aeval(&, 6,2) = (ô, €¢(0,6)) where ô = 6(A(z)) 


We omit the straightforward definitions of the abstract variants of gc, restrict, 


ees 


and extend. 
LET ae 
(Pce, ce) = restrict(ce, A, 6) 
Gees Pee t H cel (0',6y),€ ô =E(6) — (p',6”) = extend(A, 6’, x,t, 0’, êL) 
(Ae, Ge) = restrict(e, ĝ', a”) ĉe, pest F el (ô, ôv), Ê 
ô, ĝ,Îî H let x = ce in e Ñ (ô, ôv), Eô 
CALL a 
(ĉo, Go) = aeval(ĉ, Ô, aeo) (Azx.e, po) € ĉo 
01,61) = aeval(ô, ĝ, ae1) 
Ë? = | (aeo ae1) :: t]m (p', 6’) = extend (ôo, ĉo, x, F, 01, 61) 
(Pe, Fe) = restrict(e, p’, ô”) Ge, Pe, E H el} (6, ôv), E 
ô, p,tF (aeo ae1) Ñ (6,60), Ê 
SET! a 
(6, Gv) = aeval(G, p, ae) 
(Ê = extend(L, L, 2, p(w), ô, ôv) ronio 
ô, p,t+ set! x ae | ({(Ax.x, L)}, L), Ê ô, pth ae j.aeval(6, p, ae), L 


Fig. 6. The abstract compositional-store semantics with garbage collection 
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As a final step before we establish the simulation relationship, we define an 
ordering on stores (and logs, extending it in the natural way): 


ôo E 6, & Va € Address.G9(@) C 61 â) o E 01 & bo C Oy 


We formally connect this abstract semantics with the concrete compositional- 
store semantics given in Section 4.4 by the following abstraction theorem. 


Theorem 1. If|o| E & and |p| = ô and |t| = ê and o, p,t F e gc (V, ov), €, then 
ô, p,tk ed} (ô, 6y),€ where |v| E ô and |o,| E Gy and |g] E é. 


This theorem states that if the configuration components are related by ab- 
straction, then, for any given derivation in the exact semantics, there is an deriva- 
tion in the abstract semantics which yields an abstraction of its results. It can 
be proved by induction on the derivation. 


6 Discussion 


Now we examine the ramifications of a compositional treatment of analysis com- 
ponents. We do so in turn, first considering the ramifications of treating the store 
compositionally and then of treating the time compositionally. 


6.1 The Effects of Treating the Store Compositionally 


We saw in Section 4.3 that a semantics could treat stores compositionally without 
employing GC. In this case, the caller’s store and callee’s final store agreed on 
common entries and combining them produced the same store as the threaded- 
store semantics. However, the compositional machinery liberates evaluation from 
the stack. With evaluation so-liberated, GC need not preserve any heap data 
reachable solely from the stack. This relaxation 


1. simplifies GC and increases its effectiveness; 
2. leads to general yet precise summaries; and 
3. restores context irrelevance under GC. 


We discuss each of these aspects in more detail. 


Simplified and More-Effective Garbage Collection Classical abstract GC 
and its succeeding pushdown GC each preserve heap data reachable from both 
the local environment and the stack. Once one has determined the root set of 
reachable addresses from these two components, it determines the transitive 
closure of reachability. When GC is performed with respect to only the local 
environment, both the initial root set and its transitive closure are smaller and 
it requires less work to calculate them. If the CFA employs incomplete garbage 
collection [8], the garbage collector is also freed from calculating the root set 
of stack addresses as a fixed point. A smaller transitive closure of reachable 
addresses is not only less costly to calculate but also leads to more collected 
garbage. 
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General Yet Precise Summaries A stack-precise CFA without GC will 
falsely distinguish abstract evaluations of the same call which are identical mod- 
ulo GC-able heap data. In such cases, the addition of pushdown GC will allow the 
CFA to identify them. However, even with pushdown GC, a stack-precise CFA 
will falsely distinguish abstract evaluations of the same call which are identical 
modulo continuation-reachable heap data. On the other hand, compositional GC 
soundly disregards such data and thereby identifies such evaluations. 

Compositional GC is able to achieve this feat because its calculates the frag- 
ments of the heap reachable from the local environment alone. Since this envi- 
ronment is restricted to the free variables of the expression it closes, the resultant 
heap fragment includes a tight overapproximation of the actually-relevant heap 
data. One effect is that evaluation summaries—the association of an evaluation 
configuration with its results—are general yet precise. They are general since, 
with a minimum of irrelevant heap data, more contexts are consistent with them. 
They are precise since, with a minimum of irrelevant heap data, they are less 
likely to allocate an entry at an existing address. In fact, the precision of com- 
positional GC dominates that of pushdown GC. 


Restored Context Irrelevance A semantics determines which parts of a given 
configuration are relevant to its evaluation [8]. When the continuation is irrel- 
evant to evaluation, the semantics exhibits the property of context irrelevance. 
Context irrelevance is an intuitive property: unless our semantics has control 
effects or some other explicit dependence, we would be surprised if a configu- 
ration’s continuation was relevant to its evaluation. Even a concrete semantics 
with GC exhibits context irrelevance since data reachable from the stack alone 
will not effect the result of evaluation. In an abstract semantics with GC, how- 
ever, where new allocations can occur at old addresses, the presence of data 
reachable from the stack alone can affect evaluation. The set of data preserved 
by GC, which determines how evaluation is affected, is itself determined by the 
continuation. Thus, an abstract semantics in which GC is defined with respect 
to the stack violates context irrelevance. 

Put this way, it is clear why compositional GC restores context irrelevance to 
the semantics: it removes the dependence on the stack from GC itself and allows 
all data reachable from the stack alone to be collected. This restoration makes 
evaluation easier to reason about and increases the effectiveness of memoization. 


6.2 The Effect of Treating the Time Compositionally 


The k-CFA context abstraction consists of a sequence of k call sites—for each 
point in execution, the last k call sites encountered. In Section 3.5, we discussed 
how the last-k-call-sites abstraction arose as a consequence of the semantics 
threading the abstract time (i.e. the context) through execution. 

In contrast, the big-step, concrete semantics of Section 4 and the big-step, 
abstract semantics of Section 5 didn’t thread the abstract time through execution 
but treated it compositionally, installing a new time at a call but restoring the 
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previous time at the corresponding return. This treatment of time induces a 
different notion of context than k-CFA; instead of yielding the last-k call sites, 
it yields the top-m stack frames. 

This top-m-stack-frames context abstraction is not novel and originates with 
m-CFA [11], a family of polynomial-time CFAs. However, to our knowledge, its 
appearance here is its first in a stack-precise setting: many stack-precise CFAs 
encode context using other means than a time component (or don’t use context 
in the first place) [16,3, 1]; still others achieve the last-k-call-sites abstraction, 
incidentally or intentionally [4, 18]. 

Using the top-m stack frames to qualify heap allocation has certain advan- 
tages to using the last-k call sites; in particular, its power to distinguish bindings 
is not diluted by static call sequences. To see how k-CFA’s and m-CFA’s context 
abstractions compare, let’s consider a few examples. 

First, consider a [k = 2|CFA of the program 


(define (f x) x) 
(define (g y) (f y)) 
(g 42) 

(g 35) 


the abstract resource 42 is allocated in the heap twice—first when the call to 
g is made and second when the call to f is made. At the point of the second 
allocation, the two most-recently-encountered call sites in evaluation are (f y) 
and (g 42); hence, these call sites are used to qualify the binding of 42 to x in 
the heap. The treatment of the abstract resource 35 is similar except its second 
allocation is qualified by (f£ y) and (g 35). For this program, [k = 2|CFA is 
able to keep the two allocations distinct. 
Next, consider a [k = 2]CFA of the similar program 


(define (f x) x) 

(define (g y) 
(displayln y) 
(f y)) 

(g 42) 

(g 35) 


which includes the call (displayln y) in the body of g. As in the previous 
program, the analysis of this program allocates the abstract resources 42 and 35 
twice each. However, in this program, the second of each of their allocations is 
qualified by (f y) and (displayln y). In fact, every call to f made via g will 
occur in that same context. In a sense, the static sequence of (displayln y) 
and (f y) eats up the context budget ensuring that the analysis conflates all 
bindings made at the call (f y). (Incrementing k would remove the conflation 
in this example, but it makes the analysis more expensive and such a strategy 
can always be confounded by a longer “static” trace of calls.) 

To constrast, consider an |m = 2]CFA of the same program. Because the 
context consists of the top two stack frames, the allocation of 42 is qualified by 
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(£ y) and (g 42) and the allocation of 35 is qualified by (f y) and (g 35). 
Because the second stack frame of each allocation is distinct, [m = 2]CFA is able 
to keep the bindings distinct in the analysis. 

The top-m-stack-frames context abstraction is itself susceptible to deep nests 
of calls which serve only to pass parameters: if the nesting depth exceeds m, then 
the analysis will conflate the bindings made by the innermost calls. And, as with 
k-CFA, an increased m can always be confounded by a deeper nesting. In spite 
of that, the m-CFA context abstraction has been shown to work well relative to 
k-CFA in practice in a stack-imprecise setting where variables are aggressively 
re-bound [11]. Future work is needed to verify that its advantages carry over to 
a stack-precise setting. 


7 Related Work 


Broadly, this work is an instance of abstract interpretation and, more specifically, 
of control-flow analysis (CFA) [9,14]. It inherits from the Abstracting Abstract 
Machines methodology [15] of systematically deriving CFAs from purely opera- 
tional specifications. More specifically, this work is an instance of stack-precise 
CFA which is preceded by many variations [16,3, 8,6, 12, 1, 18]. 

Might and Shivers [10] first introduced GC to CFA. Reconciling GC with 
stack-precise CFAs has been the focus of significant effort. Earl et al. [4] intro- 
duced the first technique to do so which approximated the the set of frames 
that could be on any possible stack at any given control point. Johnson and Van 
Horn [8] cast this technique into a more operational framework and considered 
a more-precise variant in which a control point splits for each possible stack 
with its heap being collected with respect to that stack alone. Johnson et al. [7] 
unified these previous two works in one formal framework. Darais et al. [1] show 
that the Abstracting Definitional Interpreters approach easily accommodates 
abstract GC by introducing a machine component which contains the addresses 
embedded in stack frames; this realization of GC amounts essentially to the fully- 
precise technique. Our work sidesteps the need for all of this previous effort by 
decomposing the heap into continuation-independent fragments. 

A significant concept in the work of Johnson and Van Horn [8] is context 
irrelevance, the property that the evaluation of a configuration is independent 
of its continuation, and they note that the approximate abstract GC technique 
introduced by Earl et al. [4] violates context irrelevance. Once again, the in- 
dependence of GC from the stack under our technique sidesteps these issues; 
evaluation under our technique exhibits context irrelevance effortlessly. 

As part of the resolution of an apparent paradox regarding the complexities of 
object-oriented k-CFA and functional k-CFA, Might et al. [11] develop m-CFA, 
a stack-imprecise, polynomial-time family of CFA that employs the top-m stack 
frames as a context abstraction as opposed to the last-k call sites of k-CFA. They 
show that this abstraction is more resilient against approximation in the face 
of the aggressive rebinding that m-CFA effects. Our treatment of the abstract 
time component induces this same top-m-stack-frames context abstraction but 
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in a stack-precise setting, the first such appearance in the literature, to our 
knowledge. 

Although not inspired by it, our work surprisingly shares much of the per- 
spective and approach of the work of Dillig et al. [2] to verify C and C++ 
programs. In particular, both works employ a compositional approach to analy- 
sis by producing evaluation summaries and decompose the heap to support their 
approach. In addition, both works have some notion of propagation of summary 
effects: theirs is a summary transfer function; ours is an effect log. In contrast, 
our work does not produce summaries in a bottom-up fashion and is targeted to- 
ward explicitly higher-order languages with effects. Interesting future work could 
explore whether any precision-enhancing techniques of Dillig et al. [2] could be 
ported and applied, whether the bottom-up production of summaries is viable, 
or whether their general approach can be used for verification in our setting. 


8 Conclusion and Future Work 


In this paper, we showed that treating the heap compositionally in a stack-precise 
CFA removes its dependence on the stack, at once simplifying GC and increasing 
its effectiveness. As a result, the analysis produces more compact and precise 
evaluation summaries that are more amenable to reuse. We also showed that 
treating the time component compositionally induces the top-m-stack-frames 
context abstraction of m-CFA. Unlike k-CFA’s last-k-call-sites context abstrac- 
tion, m-CFA’s need not devote any precision to static call sequences. 

Interestingly, the notion of context shared by k-CFA and m-CFA—calling 
context, roughly—seems to be at odds with summary reuse. In a stack-precise 
1CFA (which exhibits the same context abstraction whether it is [k = 1J]CFA or 
[m = 1]CFA), the syntactic call site of the caller is encoded in the summary of 
the callee, preventing the summary’s reuse at any other call site. If this tension 
is fundamental, it might benefit to look to alternative notions of context—extant 
and novel. 

The complement to abstract GC is abstract counting [10] which keeps track 
of the number of concrete resources that correspond to an abstract resource 
and enables certain abstract transitions, such as a strong store update. If an 
abstact counting can be applied to heap fragments such that the overlap among 
fragments is accounted for correctly, it might be possible to detect opportunities 
to perform strong updates to heap bindings which would further increase the 
precision of our technique. 

Finally, Darais et al. [1] consider a particular value abstraction in which 
primitive operations propagate imprecision but do not introduce it. Their ab- 
straction suggests a generalization in which each “basic block” is analyzed at full 
precision and imprecision occurs only at the join points of control flow. CFA2’s 
stack environments capture an aspect of this generalization and it appears our 
technique does as well. However, a focused investigation would reveal whether 
such a generalization can be more-fully realized. 
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Abstract. Solidity is the dominant programming language for Ethereum 
smart contracts. This paper presents a high-level formalization of the So- 
lidity language with a focus on the memory model. The presented formal- 
ization covers all features of the language related to managing state and 
memory. In addition, the formalization we provide is effective: all but few 
features can be encoded in the quantifier-free fragment of standard SMT 
theories. This enables precise and efficient reasoning about the state of 
smart contracts written in Solidity. The formalization is implemented in 
the SOLC-VERIFY verifier and we provide an extensive set of tests that 
covers the breadth of the required semantics. We also provide an evalu- 
ation on the test set that validates the semantics and shows the novelty 
of the approach compared to other Solidity-level contract analysis tools. 


1 Introduction 


Ethereum [32] is a public blockchain platform that provides a novel computing 
paradigm for developing decentralized applications. Ethereum allows the deploy- 
ment of arbitrary programs (termed smart contracts [31]) that operate over the 
blockchain state. The public can interact with the contracts via transactions. It 
is currently the most popular public blockchain with smart contract functional- 
ity. While the nodes participating in the Ethereum network operate a low-level, 
stack-based virtual machine (EVM) that executes the compiled smart contracts, 
the contracts themselves are mostly written in a high-level, contract-oriented 
programming language called Solidity [30]. 

Even though smart contracts are generally short, they are no less prone 
to errors than software in general. In the Ethereum context, any flaws in the 
contract code come with potentially devastating financial consequences (such as 
the infamous DAO exploit [17]). This has inspired a great interest in applying 
formal verification techniques to Ethereum smart contracts (see e.g., [4] or [14] for 
surveys). In order to apply formal verification of any kind, be it static analysis or 
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model checking, the first step is to formalize the semantics of the programming 
language that the smart contracts are written in. Such semantics should not 
only remain an exercise in formalization, but should preferably be developed, 
resulting in precise and automated verification tools. 

Early approaches to verification of Ethereum smart contracts focused mostly 
on formalizing the low-level virtual machine precisely (see, e.g., [11,19,21,22,2]). 
However, the unnecessary details of the EVM execution model make it difficult to 
reason about high-level functional properties of contracts (as they were written 
by developers) in an effective and automated way. For Solidity-level properties 
of smart contracts, Solidity-level semantics are preferred. While some aspects 
of Solidity have been studied and formalized [23,10,15,33], the semantics of the 
Solidity memory model still lacks a detailed and precise formalization that also 
enables automation. 

The memory model of Solidity has various unusual and non-trivial behaviors, 
providing a fertile ground for potential bugs. Smart contracts have access to two 
classes of data storage: a permanent storage that is a part of the global blockchain 
state, and a transient local memory used when executing transactions. While the 
local memory uses a standard heap of entities with references, the permanent 
storage has pure value semantics (although pointers to storage can be declared 
locally). This memory model that combines both value and reference semantics, 
with all interactions between the two, poses some interesting challenges but 
also offers great opportunities for automation. For example, the value semantics 
of storage ensures non-aliasing of storage data. This can, if supported by an 
appropriate encoding of the semantics, potentially improve both the precision 
and effectiveness of reasoning about contract storage. 

This paper provides a formalization of the Solidity semantics in terms of a 
simple SMT-based intermediate language that covers all features related to man- 
aging contract storage and memory. A major contribution of our formalization 
is that all but few of its elements can be encoded in the quantifier-free fragment 
of standard SMT theories. Additionally, our formalization captures the value se- 
mantics of storage with implicit non-aliasing information of storage entities. This 
allows precise and effective verification of Solidity smart contracts using modern 
SMT solvers. The formalization is implemented in the open-source SOLC-VERIFY 
tool [20], which is a modular verifier for Solidity based on SMT solvers. We val- 
idate the formalization and demonstrate its effectiveness by evaluating it on a 
comprehensive set of tests that exercise the memory model. We show that our 
formalization significantly improves the precision and soundness compared to 
existing Solidity-level verifiers, while remarkably outperforming low-level EVM- 
based tools in terms of efficiency. 


2 Background 


2.1 Ethereum 


Ethereum [32,3] is a generic blockchain-based distributed computing platform. 
The Ethereum ledger is a storage layer for a database of accounts (identified 


226 A. Hajdu and D. Jovanović 


by addresses) and the data associated with the accounts. Every account has 
an associated balance in Ether (the native cryptocurrency of Ethereum). In 
addition, an account can also be associated with the executable bytecode of a 
contract and the contract state. 

Although Ethereum contracts are deployed to the blockchain in the form 
of the bytecode of the Ethereum Virtual Machine (EVM) [32], they are gener- 
ally written in a high-level programming language called Solidity [30] and then 
compiled to EVM bytecode. After deployment, the contract is publicly acces- 
sible and its code cannot be modified. An external user, or another contract, 
can interact with a contract through its API by invoking its public functions. 
This can be done by issuing a transaction that encodes the function to be called 
with its arguments, and contains the contract’s address as the recipient. The 
Ethereum network then executes the transaction by running the contract code 
in the context of the contract instance. 

A contract instance has access to two different kinds of memory during its 
lifetime: contract storage and memory. Contract storage is a dedicated data 
store for a contract to store its persistent state. At the level of the EVM, it is 
an array of 256-bit storage slots stored on the blockchain. Contract data that 
fits into a slot, or can be sliced into fixed number of slots, is usually allocated 
starting from slot 0. More complex data types that do not fit into a fixed number 
of slots, such as mappings, or dynamic arrays, are not supported directly by the 
EVM. Instead, they are implemented by the Solidity compiler using storage as a 
hash table where the structured data is distributed in a deterministic collision- 
free manner. Contract memory is used during the execution of a transaction on 
the contract, and is deleted after the transaction finishes. This is where function 
parameters, return values and temporary data can be allocated and stored. 


2.2 Solidity 


Solidity [30] is the high-level programming language supporting the develop- 
ment of Ethereum smart contracts. It is a full-fledged object-oriented program- 
ming language with many features focusing on enabling rapid development of 
Ethereum smart contracts. The focus of this paper is the semantics of the Solid- 
ity memory model: the Solidity view of contract storage and memory, and the 
operations that can modify it. Thus, we restrict the presentation to a generous 
fragment of Solidity that is relevant for discussing and formalizing the memory 
model. An example contract that illustrates relevant features is shown in Fig- 
ure 1, and the abstract syntax of the targeted fragment is presented in Figure 2. 
We omit parts of Solidity that are not relevant to the memory model (e.g., in- 
heritance, loops, blockchain-specific members). We also omit low-level, unsafe 
features that can break the Solidity memory model abstractions (e.g., assembly 
and delegatecall). 


3 There is an additional data location named calldata that behaves the same as mem- 
ory, but is used to store parameters of external functions. For simplicity, we omit it 
in this paper. 
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contract DataStorage { 
struct Record { 
bool set; 
int[] data; 
} 


mapping (address=>Record) private records; 


function append(address at, int d) public { 


Record storage r = records[at]; 
r.set = true; 
r.data.push(d) ; 
$ 
function isset(Record storage r) internal view returns (bool s) { 
s = r.set; 
$ 


function get(address at) public view returns (int[] memory ret) { 
require (isset (records [at])); 
ret = records[at].data; 
} 
$ 


Fig. 1: An example contract illustrating commonly used features of the Solidity 
memory model. The contract keeps an association between addresses and data 
and allows users to query and append to their data. 


Contracts. Solidity contracts are similar to classes in object-oriented program- 
ming. A contract can define any additional types needed, followed by the dec- 
laration of the state variables and contract functions, including an optional sin- 
gle constructor function. The contract’s state variables define the only persis- 
tent data that the contract instance stores on the blockchain. The constructor 
function is only used once, when a new contract instance is deployed to the 
blockchain. Other public contract functions can be invoked arbitrarily by exter- 
nal users through an Ethereum transaction that encodes the function call data 
and designates the contract instance as the recipient of the transaction. 


Example 1. The contract DataStorage in Figure 1 defines a struct type Record. 
Then it defines the contract storage as a single state variable records. Finally 
three contract functions are defined append (), isset(), and get (). Note that 
a constructor is not defined and, in this case, a default constructor is provided 
to initialize the contract state to default values. 


Solidity supports further concepts from object-oriented programming, such as in- 
heritance, function modifiers, and overloading (also covered by our implementa- 
tion [20]). However, as these are not relevant for the formalization of the memory 
model we omit them to simplify our presentation. 


Types. Solidity is statically typed and provides two classes of types: value types 
and reference types. Value types include elementary types such as addresses, 
integers, and Booleans that are always passed by value. Reference types, on the 
other hand, are passed by reference and include structs, arrays and mappings. 


228 A. Hajdu and D. Jovanović 


TypeName ::= address | int | uint | bool Value types 
mapping( TypeName => TypeName) Mapping 
TypeName[] | TypeName [n] Arrays 
StructName Struct name 

DataLoc ::= storage | memory Data location 

lval n= id Identifier 
expr . id Member access 
expr [expr] Index access 

expr z= lval Lvalue 
expr ? expr: expr Conditional 
new TypeName [] (expr) New memory array 
StructName (expr) New memory struct 

stmt = TypeName DataLoc? id [= expr]; Local variable declaration 
(lval)* = (expr)* ; Assignment (tuples) 
lval . push (expr) ; Push 
lval.pop(); Pop 
delete lval; Delete 


StructMem ::= TypeName id; Struct member 


StructDef ::= struct StructName { StructMem* } Struct definition 

StateVar ::= TypeName id; State variable definition 

FunPar n= TypeName DataLoc? id Function parameter 

Fun := function id(FunPar*) Function definition 
[returns (FunPar*)] { stmt* } 

Constr = constructor(FunPar*) { stmt* } Constructor definition 

Contract ::= contract id Contract definition 


{StructDef* StateVar* Constr? Fun*} 


Fig. 2: Syntax of the targeted Solidity fragment. 


A struct consists of a fixed number of members. An array is either fixed-size or 
dynamically-sized and besides the elements of the base type, it also includes a 
length field holding the number of elements. A mapping is an associative array 
mapping keys to values. The important caveat is that the table does not actually 
store the keys so it is not possible to check if a key is defined in the map. 


Example 2. The contract in Figure 1 uses the following types. The records 
variable is a mapping from addresses to Record structures which, in turn, consist 
of a Boolean value and a dynamically-sized integer array. It is a common practice 
to define a struct with a Boolean member (set) to indicate that a mapping value 
has been set. This is because Solidity mappings do not store keys: any key can 
be queried, returning a default value if no value was associated previously. 


Data locations for reference types. Data of reference types resides in a data 
location that is either storage or memory. Storage is the persistent store used 
for state variables of the contract. In contrast, memory is used during execution 
of a transaction to store function parameters, return values and local variables, 
and it is deleted after the transaction finishes. 
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Semantics of reference types differ fundamentally depending on the data loca- 
tion that they are stored in. Layout of data in the memory data location resem- 
bles the memory model common in Java-like programming languages: there is a 
heap where reference types are allocated and any entity in the heap can contain 
values of value types, and references to other memory entities. In contrast, the 
storage data location treats and stores all entities, including those of reference 
types, as values with no references involved. Mixing storage and memory is not 
possible: the data location of a reference type is propagated to its elements and 
members. This means that storage entities cannot have references to memory 
entities, and memory entities cannot have reference types as values. Storage of 
a contract can be viewed as a single value with no aliasing possible. 


contract C { 
struct T { 
int Z: 
+ 
struct S 4 


function f(S memory smi) public { 
T memory tm = smi.ta[il]; 
S memory sm2 = S(O, smi.ta); 


I 


Fig. 3: An example illustrating reference types (structs and arrays) and their lay- 
out in storage and memory: (a) a contract defining types and state variables; (b) 
an abstract representation of the contract storage as values; and, (c) a function 
using the memory data location and a possible layout of the data in memory. 


Example 3. Consider the contract C defined in Figure 3a. The contract defines 
two reference struct types S and T, and declares state variables s, t, and sa. 
These variables are maintained in storage during the contract lifetime and they 
are represented as values with no references within. A potential value of these 
variables is shown in Figure 3b. On the other hand, the top of Figure 3c shows a 
function with three variables in the memory data location, one as the argument 
to the function, and two defined within the function. Because they are in memory, 
these variables are references to heap locations. Any data of reference types, 
stored within the structures and arrays, is also a reference and can be reallocated 
or assigned to point to an existing heap location. This means that the layout of 
the data can contain arbitrary graphs with arbitrary aliasing. A potential layout 
of these variables is shown at the bottom of Figure 3c. 


Functions. Functions are the Solidity equivalent of methods in classes. They 
receive data as arguments, perform computations, manipulate state variables 
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and interact with other Ethereum accounts. Besides accessing the storage of the 
contract through its state variables, functions can also define local variables, in- 
cluding function arguments and return values. Variables of value types are stored 
as values on a stack. Variables of reference types must be explicitly declared with 
a data location, and are always pointers to an entity in that data location (stor- 
age or memory). A pointer to storage is called a local storage pointer. As the 
storage is not memory in the usual sense, but a value instead, one can see storage 
pointers as encoding a path to one reference type entity in the storage. 


Example 4. Consider the example in Figure 1. The local variable r in function 
append () points to the struct at index at of the state variable records (residing 
in the contract storage). In contrast, the return value ret of function get () is 
a pointer to an integer array in memory. 


Statements and expressions. Solidity includes usual programming statements 
and control structures. To keep the presentation simple, we focus on the state- 
ments that are related to the formalization of the memory model: local variable 
declarations, assignments, array manipulation, and the delete statement.* So- 
lidity expressions relevant for the memory model are identifiers, member and 
array accesses, conditionals and allocation of new arrays and structs in memory. 

If a value is not provided, local variable declarations automatically initialize 
the variable to a default value. For reference types in memory, this allocates new 
entities on the heap and performs recursive initialization of its members. For 
reference types in storage, the local storage pointers must always be explicitly 
initialized to point to a storage member. This ensures that no pointer is ever 
“null”. Value types are initialized to their simple default value (0, false). Behavior 
of assignment in Solidity is complex (see Section 3.5) and depends on the data 
location of its arguments (e.g., deep copy or pointer assignment). Dynamically- 
sized storage arrays can be extended by pushing an element to their end, or 
can be shrunk by popping. The delete statement assigns the default value 
(recursively for reference types) to a given entity based on its type. 


Example 5. The assignment r.set = true in the append() function of Figure 1 
is a simple value assignment. On the other hand, ret = records[at].data in 
the get () function allocates a new array on the heap and performs a deep copy 
of data from storage to memory. 


2.3 SMT-Based Programs 


We formalize the semantics of the Solidity fragment by translating it to a simple 
programming language that uses SMT semantics [9,12] for the types and data. 
The syntax of this language is shown in Figure 4. The syntax is purposefully 


t Our implementation [20] supports a majority of statements, excluding low-level op- 
erations (such as inline assembly). Loops are also supported and can be specified 
with loop invariants. 
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TypeName — ::= int | bool Integer, Boolean 
[| TypeName] TypeName SMT array 
DataTypeName SMT datatype 
DataTypeDef ::= DataTypeName((id : TypeName)*) Datatype definition 
expr n= id Identifier 
expr | expr] Array read 
expr|expr + expr] Array write 
DataTypeName(expr*) Datatype constructor 
expr.id Member selector 
ite(expr, expr, expr) Conditional 
expr + expr | expr — expr Arithmetic expression 
VarDecl n= id: TypeName Variable declaration 
stmt i= id := expr Assignment 
if expr then stmt* else stmt* If-then-else 
assume(expr) Assumption 
Program = DataTypeDef* VarDecl* stmt* Program definition 


Fig. 4: Syntax of SMT-based programs. 


minimal and generic, so that it can be expressed in any modern SMT-based 
verification tool (e.g., Boogie [5], Why3 [18] or Dafny [26]).° 

The types of SMT-based programs are the SMT types: simple value types 
such as Booleans and mathematical integers, and structured types such as ar- 
rays [27,16] and inductive datatypes [8]. The expressions of the language are 
standard SMT expressions such as identifiers, array reads and writes, datatype 
constructors, member selectors, conditionals and basic arithmetic [7]. All vari- 
ables are declared at the beginning of a program. The statements of the language 
are limited to assignments, the if-then-else statement, and assumption statement. 

SMT-based programs are a good fit for modeling of program semantics. For 
one, they have clear semantics with no ambiguities. Furthermore, any property 
of the program can be checked with SMT solvers: the program can be translated 
directly to a SMT formula by a single static assignment (SSA) transformation. 

Note that the syntax requires the left hand side of an assignment to be an 
identifier. However, to make our presentation simpler, we will allow array read, 
member access and conditional expressions (and their combination) as LHS. 
Such constructs can be eliminated iteratively in the following way until only 
identifiers appear as LHS in assignments. 


— ali] := e is equivalent to a := ali + e]. 
— dim; := e is equivalent to d := D(d.mı,...,d.Mj—1, €, dmgaay 24.9 d.Mn), 
where D is the constructor of a datatype with members M1,..., Mn. 


— ite(c,t, f) := e is equivalent to if c then t := e else f := e. 


5 Our current implementation is based on Boogie, but we have plans to introduce a 
generic intermediate representation that could incorporate alternate backends such 
as Why3 or Dafny. 
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3 Formalization 


In this section we present our formalization of the Solidity semantics through 
a translation that maps Solidity elements to constructs in the SMT-based lan- 
guage. The formalization is described top-down in separate subsections for types, 
contracts, state variables, functions, statements, and expressions. 


3.1 Types 


We use 7(.) to denote the function that maps a Solidity type to an SMT type. 
This function is used in the translation of contract elements and can, as a side 
effect, introduce datatype definitions and variable declarations. This is denoted 
with [decl] in the result of the function. To simplify the presentation, we assume 
that such side effects are automatically added to the preamble of the SMT pro- 
gram. Furthermore, we assume that declarations with the same name are only 
added once. We use type(expr) to denote the original (Solidity) type of an ex- 
pression (to be used later in the formalization). The definition of 7(.) is shown 
in Figure 5. 


T (bool) = bool 

JT (address) = J(int) = 7(uint) = int 

T (mapping(K=>V) storage) = [T(K)|T(V) 

T (mapping(K=>V) storptr) = [intlint 

T(T In] storage) = 7(T[] storage) 

T(T[n] storptr) = 7(T[] storptr) 

T(T[n] memory) =7(T[] memory) 

T(TU] storage) = StorArrr with [StorArrr(arr : [int|T (ZL), length : int)| 
T(TL] storptr) = [intjint 

T(T[] memory) = int with [MemArrr(arr : [int]T (TL), length : int)| 


[arrheap, : [int| MemArrr] 


T(struct S storage) = StorStructs with [StorStructs(...,mi:T7(Si),---)] 

T(struct S storptr) = [intlint 

T(struct S memory) = int with [MemStructs(...,mi : T(Si),---)] 
[structheap g : [int] MemStructs] 


Fig. 5: Formalization of Solidity types. Members of struct S are denoted as m; 
with types Sj. 


Value types. Booleans are mapped to SMT Booleans while other value types 
are mapped to SMT integers. Addresses are also mapped to SMT integers so 
that arithmetic comparison and conversions between integers and addresses is 
supported. For simplicity, we map all integers (signed or unsigned) to SMT 
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integers. Solidity also allows function types to store, pass around, and call 
functions, but this is not yet supported by our encoding. 


Reference types. The Solidity syntax does not always require the data location 
for variable and parameter declarations. However, for reference types it is always 
required (enforced by the compiler), except for state variables that are always 
implicitly storage. In our formalization, we assume that the data location of 
reference types is a part of the type. As discussed before, memory entities are 
always accessed through pointers. However, for storage we distinguish whether 
it is the storage reference itself (e.g., state variable) or a storage pointer (e.g., 
local variable, function parameter). We denote the former with storage and the 
latter with storptr in the type name. Our modeling of reference types relies on 
the generalized theory of arrays [16] and the theory of inductive data-types [8], 
both of which are supported by modern SMT solvers (e.g., Cvc4 [6] and 23 [28]). 


Mappings and arrays. For both arrays and mappings, we abstract away the 
implementation details of Solidity and model them with the SMT theory of 
arrays and inductive datatypes. We formalize Solidity mappings simply as SMT 
arrays. Both fixed- and dynamically-sized arrays are translated using the same 
SMT type and we only treat them differently in the context of statements and 
expressions. Strings and byte arrays are not discussed here, but we support them 
as particular instances of the array type. To ensure that array size is properly 
modeled we keep track of it in the datatype (length) along with the actual 
elements (arr). 

For storage array types with base type T, we introduce an SMT datatype 
StorArrr with a constructor that takes two arguments: an inner SMT array (arr) 
associating integer indexes and the recursively translated base type (7(T)), and 
an integer length. The advantage of this encoding is that the value semantics 
of storage data is provided by construction: each array element is a separate 
entity (no aliasing) and assigning storage arrays in SMT makes a deep copy. 
This encoding also generalizes if the base type is a reference type. 

For memory array types with base type T, we introduce a separate datatype 
MemArrr (side effect). However, memory arrays are stored with pointer values. 
Therefore the memory array type is mapped to integers, and a heap (arrheap,) 
is introduced to associate integers (pointers) with the actual memory array 
datatypes. Note that mixing data locations within a reference type is not possi- 
ble: the element type of the array has the same data location as the array itself. 
Therefore, it is enough to introduce two datatypes per element type T: one for 
storage and one for memory. In the former case the element type will have value 
semantics whereas in the latter case elements will be stored as pointers. 


Structs. For each storage struct type S the translation introduces an inductive 
datatype StorStructs, including a constructor for each struct member with types 


6 Note that this does not capture the precise machine integer semantics, but this is 
not relevant from the perspective of the memory model. Precise computation can be 
provided by relying on SMT bitvectors or modular arithmetic (see, e.g., [20]). 
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mapped recursively. Similarly to arrays, this ensures the value semantics of stor- 
age such as non-aliasing and deep copy assignments. For each memory struct S 
we also introduce a datatype MemStructs and a constructor for each member.” 
However, the memory struct type itself is mapped to integers (pointer) and a 
heap (structheapg) is introduced to associate the pointers with the actual mem- 
ory struct datatypes. Note that if a memory struct has members with reference 
types, they are also pointers, which is ensured recursively by our encoding. 


3.2 Local Storage Pointers 


An interesting aspect of the storage data location is that, although the stored 
data has value semantics, it is still possible to define pointers to an entity in 
storage within a local context, e.g., with function parameters or local variables. 
These pointers are called local storage pointers. 


Example 6. In the append() function of Figure 1 the variable r is defined to be 
a convenience pointer into the storage map records [at]. Similarly, the isset () 
function takes a storage pointer to a Record entity in storage as an argument. 


Since our formalization uses SMT datatypes to encode the contract data in stor- 
age, it is not possible to encode these pointers directly. A partial solution would 
be to substitute each occurrence of the local pointer with the expression that is 
assigned to it when it was defined. However, this approach is too simplistic and 
has limitations. Local storage pointers can be reassigned, or assigned condition- 
ally, or it might not be known at compile time which definition should be used. 
Furthermore, local storage pointers can also be passed in as function arguments: 
they can point to different storage entities for different calls. 

We propose an approach to encode local storage pointers while overcoming 
these limitations. Our encoding relies on the fact that storage data of a contract 
can be viewed as a finite-depth tree of values. As such, each element of the stored 
data can be uniquely identified by a finite path leading to it. 


Example 7. Consider the contract C in Figure 6a. The contract defines structs 
T and S, and state variables of these types. If we are interested in all storage 
entities of type T, we can consider the sub-tree of the contract storage tree that 
has leaves of type T, as depicted in Figure 6b. The root of the tree is the contract 
itself, with indexed sub-nodes for state variables, in order. For nodes of struct 
type there are indexed sub-nodes leading to its members, in order. For each node 
of array type there is a sub-node for the base type. Every pointer to a storage T 
entity can be identified by a path in this tree: by fixing the index to each state 


T Mappings in Solidity cannot reside in memory. If a struct defines a mapping member 
and it is stored in memory, the mapping is simply inaccessible. Such members could 
be omitted from the constructor. 

8 Solidity does support a limited form of recursive data-types. Such types could make 
the storage a tree of potentially arbitrary depth. We chose not to support such types 
as recursion is non-existing in Solidity types used in practice. 
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tract C1 
P { rae = 
iot z; t1 (0) ite(ptr|O| = 0, 
a 
struct , 
int x; ite(ptr[0] = 1, 
a ER ite(ptr[1] = 0, 
sees si-t 
Ip a) 
T tt; s1.ts[ptr(2]]), 
; e ite(ptr[2] = 0, 
} u ss|ptr[1]].t, 
ss[ptr{1]].ts[ptr{3]]))) 
(a) (c) 


Fig. 6: An example of packing and unpacking: (a) contract with struct definitions 
and state variables; (b) the storage tree of the contract for type T; and (c) the 
unpacking expression for storage pointers of type T. 


variable, member, and array index, as seen in brackets in Figure 6b, such paths 
can be encoded as an array of integers. For example, the state variable t1 can 
be represented as [0], the member s1.t as [1,0], and ss[8].ts[5] as [2,8,1,5]. 


This idea allows us to encode storage pointer types (pointing to arrays, structs 
or mappings) simply as SMT arrays ({ind|int). The novelty of our approach is 
that storage pointers can be encoded and passed around, while maintaining the 
value semantics of storage data, without the need for quantifiers to describe 
non-aliasing. To encode storage pointers, we need to address initialization and 
dereference of storage pointers, while assignment is simply an assignment of 
array values. When a storage pointer is initialized to a concrete expression, we 
pack the indexed path to the storage entity (that the expression references) into 
an array value. When a storage pointer is dereferenced (e.g., by indexing into or 
accessing a member), the array is unpacked into a conditional expression that 
will evaluate to a storage entity by decoding paths in the tree. 


Storage tree. The storage tree for a given type T can be easily obtained by 
filtering the AST nodes of the contract definition to only include state variable 
declarations and to, further, only include nodes that lead to a sub-node of type 
T. We denote the storage tree for type T as tree(T).° 


Packing. Given an expression (such as ss[8].ts[5]), pack(.) uses the storage 
tree for the type of the expression and encodes it to an array (e.g., [2,8,1,5]) by 
fitting the expression into the tree. Pseudocode for pack(.) is shown in Figure 7. 
To start, the expression is decomposed into a list of base sub-expressions. The 
base expression of an identifier id is id itself. For an array index eļi] or a member 


? In our implementation we do not explicitly compute the storage tree but instead 
traverse directly the AST provided by the Solidity compiler. 
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def packpath (node, subExprs, d, result): 
foreach expr in subExprs do 
if expr = id V expr = e.id then 


find edge node HW, child; 
result := result|d < i]; 
if expr = elidz] then 


find edge node w, child; 
result := result|d 4+ E(idx)]; 
node, d := child,d + 1; 
return result 
def pack(expr): 
baseExprs := list of base sub-expressions of expr; 
baseExpr := car(baseExprs); 
if baseExpr is a state variable then 
return packpath(tree(type(expr)), baseExprs, 0, constarriinine(O)) 
if baseExpr is a storage pointer then 
result := constar?[inz}int(0); 
prefix := E(baseEzpr); 
foreach path to a leaf in tree(type(baseExpr)) do 
pathResult, pathCond := prefix, true; 
foreach kth edge on the path with label id (i) do 
| pathCond := pathCond A prefix|k] = i 
pathResult := packpath(leaf, cdr(baseExprs), len(path), pathResult); 
result := ite(pathCond, pathResult, result); 
return result 


Fig. 7: Packing of an expressions. It returns a symbolic array expression that, 
when evaluated, can identify the path to the storage entity that the expression 
references. 


access e.m; it is recursively the base expressions of e. We call the first element 
of this list (denoted by car) the base expression (the innermost base expression). 
The base expression is always either a state variable or a storage pointer, and 
we consider these two cases separately. 

If the base expression is a state variable, we simply align the expression along 
the storage tree with the packpath function. The packpath function takes the 
list of base sub-expressions, and the storage tree to use for alignment, and then 
processes the expressions in order. If the current expression is an identifier (state 
variable or member access), the algorithm finds the outgoing edge annotated with 
the identifier (from the current node) and writes the index into the result array. 
If the expression is an index access, the algorithm maps and writes the index 
expression (symbolically) in the array. The expression mapping function €(.) is 
introduced later in Section 3.6. 

If the base expression is a storage pointer, the process is more general since 
the “start” of the packing must accommodate any point in storage where the base 
expression can point to. In this case the algorithm finds all paths to leaves in the 
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tree of the base pointer, identifies the condition for taking that path and writes 
the labels on the path to an array. Then it uses packpath to continue writing 
the array with the rest of the expression (denoted by cdr), as before. Finally, a 
conditional expression is constructed with all the conditions and packed arrays. 
Note, that the type of this conditional is still an SMT array of integers as it is 
the case for a single path. 


Example 8. For contract in Figure 6a, pack(ss[8] . ts [5] ) produces [2, 8, 1,5] by 
calling packpath on the base sub-expressions [ss, ss [8] , ss [8] . ts, ss [8] . ts [5]]. 
First, 2 is added as ss is the state variable with index 2. Then, ss [8] is an index 
access so 8 is mapped to 8 and added to the result. Next, ss [8] .ts is a member 
access with ts having the index 1. Finally, ss [8] .ts[5] is an index access so 5 
is mapped to 5 and added. 


def unpack(ptr): 

return unpack(ptr, tree(type(ptr)), empty, 0); 

def unpack(ptr, node, expr, d): 

result := empty; 

if node has no outgoing edges then result := expr; 
if node is contract then 


foreach edge node H0, child do 
| result := ite(ptr|d] = i, unpack(ptr, child, id, d+ 1), result); 
if node is struct then 
foreach edge node HO, child do 
| result := ite(ptr|d] = i, unpack(ptr, child, expr.id, d+ 1), result); 
if node is array/mapping with edge node = child then 
result := unpack(ptr, child, expr[ptr|d]], d+ 1); 
return result; 


Fig. 8: Unpacking of a local storage pointer into a conditional expression. 


Unpacking. The opposite of pack() is unpack(), shown in Figure 8. This function 
takes a storage pointer (of type [ind|int) and produces a conditional expression 
that decodes any given path into one of the leaves of the storage tree. The 
function recursively traverses the tree starting from the contract node and accu- 
mulates the expressions leading to the leaves. The function creates conditionals 
when branching, and when a leaf is reached the accumulated expression is sim- 
ply returned. For contracts we process edges corresponding to each state variable 
by setting the subexpression to be the state variable itself. For structs we pro- 
cess edges corresponding to each member by wrapping the subexpression into a 
member access. For both contracts and structs, the subexpressions are collected 
into a conditional as separate cases. For arrays and mappings we process the 
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single outgoing edge by wrapping the subexpression into an index access using 
the current element (at index d) of the pointer. 


Example 9. For example, the conditional expression corresponding to the tree 
in Figure 6b can be seen in Figure 6c. Given a pointer ptr, if ptr[0] = 0 then 
the conditional evaluates to t1. Otherwise, if ptr[0] = 1 then s1 has to be taken, 
where two leaves are possible: if ptr[1] = 0 then the result is s1.t otherwise it is 
s1.ts[ptr[2]], and so on. If ptr is [2,8,1,5] then the conditional evaluates exactly 
to ss[8].ts[5] from which ptr was packed.'° 


Note that with inheritance and libraries [30] it is possible that a contract 
defines a type T but has no nodes in its storage tree. The contract can still 
define functions with storage pointers to T, which can be called by derived 
contracts that define state variables of type T. In such cases we declare an array 
of type [int] T (T), called the default context, and unpack storage pointers to T 
as if the default context was a state variable. This allows us to reason about 
abstract contracts and libraries, modeling that their storage pointers can point 
to arbitrary entities not yet declared. 


3.3 Contracts, State Variables, Functions 


The focus of our discussion is the Solidity memory model and, for presentation 
purposes, we assume a minimalist setting where the important aspects of storage 
and memory can be presented: we assume a single contract and a single function 
to translate. Interactions between multiple functions are handled differently de- 
pending on the verification approach. For example, in modular verification func- 
tions are checked individually against specifications (pre- and post-conditions) 
and function calls are replaced by their specification [20]. 


State variables. Each state variable s; of a contract is mapped to a variable 
declaration s; : T(type(s;)) in the SMT program.'! The data location of state 
variables is always storage. As discussed previously, reference types are mapped 
using SMT datatypes and arrays, which ensures non-aliasing by construction. 
While Solidity optionally allows inline initializer expressions for state variables, 
without the loss of generality we can assume that they are initialized in the 
constructor using regular assignments. 


10 Note that due to the “else” branches, unpack is a is a non-injective surjective func- 
tion. For example, [a, 8, 1,5] with any a > 2 would evaluate to the same slot. However 
this does not affect our encoding as pointers cannot be compared and pack always 
returns the same (unique) values. 

11 Generalizing this to multiple contracts can be done directly by using a separate 
one-dimensional heap for each state variable, indexed by a receiver parameter (this : 
address) identifying the current contract instance (see, e.g., [20]). 
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defval(bool) = false 
defval(address) = defval(int) = defval(uint) = 0 


defval(mapping(K=>V)) = constarrj7(«))7r(v) (defval(V)) 


defval(T'[] storage) = defval(T [0] storage) 
defval(7'[] memory) = defval(T [0] memory) 


defval(T [n] storage) = StorArrr(constarrying7(r) (defval(T)), n) 
defval(T [n] memory) = [ref : int] (fresh symbol) 
{ref := refent := refent + 1} 
{arrheap 7 |ref|.length := n} 
{arrheap,|ref|.arr|i] := defval(T’)} forO<i<n 
ref 


defval(struct S storage) = StorStructs(...,defval(S;),...) 

defval(struct S memory) = [ref : int] (fresh symbol) 
{ref := refent := refent + 1} 
{ structheap g[ref|.m; = defval(S;)} for each mi 
ref 


Fig. 9: Formalization of default values. We denote struct S members as m; with 
types Sj. 


Functions calls. From the perspective of the memory model, the only important 
aspect of function calls is the way parameters are passed in and how function 
return values are treated. Our formalization is general in that it allows us to 
treat both of the above as plain assignments (explained later in Section 3.5). 
For each parameter p; and return value r; of a function, we add declarations 
pi : T(type(p;)) and r; : T (type(r;)) in the SMT program. Note that for reference 
types appearing as parameters or return values of the function, their types are 
either memory or storage pointers. 


Memory allocation. In order to model allocation of new memory entities, while 
keeping some non-aliasing information, we introduce an allocation counter refcnt : 
int variable in the preamble of the SMT program. This counter is incremented 
for each allocation of memory entities and used as the address of the new entity. 
For each parameter p; with memory data location we include an assumption 
assume(p; < refcnt) as they can be arbitrary pointers, but should not alias with 
new allocations within the function. Note that if a parameter of memory pointer 
type is a reference type containing other references, such non-aliasing constraints 
need to be assumed recursively [25]. This can be done for structs by enumerating 
members. But, for dynamic arrays it requires quantification that is nevertheless 
still decidable (array property fragment [13]). 


Initialization and default values. If we are translating the constructor function, 
each state variable s; is first initialized to its default value with a statement 
s; := defval(type(s;)). For regular functions, we set each return value r; to its 
default value with a statement r; := defval(type(r;)). We use defval(.), as defined 
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in Figure 9, to denote the function that maps a Solidity type to its default 
value as an SMT expression. Note that, as a side effect, this function can do 
allocations for memory entities, introducing extra declarations and statements, 
denoted by [decl] and {stmt}. As expected, the default value is false for Booleans 
and 0 for other primitives that map to integers. For mappings from K to V, the 
default value is an SMT constant array returning the default value of the value 
type V for each key k € K (see, e.g., [16]). The default value of storage arrays 
is the corresponding datatype value constructed with a constant array of the 
default value for base type T, and a length of n or 0 for fixed- or dynamically- 
sized arrays. For storage structs, the default value is the corresponding datatype 
value constructed with the default values of each member. 

The default value of uninitialized memory pointers is unusual. Since Solidity 
doesn’t support “null” pointers, a new entity is automatically allocated in mem- 
ory and initialized to default values (which might include additional recursive 
initialization). Note, that for fixed-size arrays Solidity enforces that the array 
size n must be an integer literal or a compile time constant, so setting each 
element to its default value is possible without loops or quantifiers. Similarly 
for structs, each member is recursively initialized, which is again possible by 
explicitly enumerating each member. 


3.4 Statements 


We use S|.] to denote the function that translates Solidity statements to a list 
of statements in the SMT program. It relies on the type mapping function T (.) 
(presented previously in Section 3.1) and on the expression mapping function €(.) 
(to be introduced in Section 3.6). Furthermore, we define a helper function A(., .) 
dedicated to modeling Solidity assignments (to be discussed in Section 3.5). 
The definition of S].] is shown in Figure 10. As a side effect, extra declarations 
can be introduced to the preamble of the SMT program (denoted by [decl]). 
The Solidity documentation [30] does not precisely state the order of evaluating 
subexpressions in statements. It only specifies that subnodes are processed before 
the parent node. This problem is independent form the discussion of the memory 
models so we assume that side effects of subexpressions are added in the same 
order as it is implemented in the compiler. Furthermore, if a subexpression is 
mapped multiple times, we assume that the side effects are only added once. 
This makes our presentation simpler by introducing fewer temporary variables. 
Local variable declarations introduce a variable declaration with the same 
identifier in the SMT program by mapping the type.!? If an initialization ex- 
pression is given, it is mapped using €(.) and assigned to the variable. Otherwise, 
the default value is used as defined by defval(.) in Figure 9. Delete assigns the 
default value for a type, which is simply mapped to an assignment in our formal- 
ization. Solidity supports multiple assignments as one statement with a tuple-like 
syntax. The documentation [30] does not specify the behavior precisely, but the 


12 Without the loss of generality we assume that identifiers in Solidity are unique. The 
compiler handles scoping and assigns an unique identifier to each declaration. 
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ST id] = lid: T(T)]; A(id, defval(T)) 
S[T id = expr] = [id: T(T)]; A(id, E(expr)) 
Sldelete eļ = A(E(e), defval(type(e))) 
Sfli,..-,ln = 71,---,1n] = [tmp; : T(type(ri))] for 1 < i < n (fresh symbols) 
A(imp,;,E(ri)) forl<i<n 
A(E(li), tmp;) for n > i > 1 (reversed) 
Sle: .push(e2)] = A(E(e1).arr[E(e1).length], E(e2)) 
E(e1).length := E(e1).length + 1 
Slle.pop()] = E(e).length := E(e).length — 1 


A(E(e).arr[E (e). length], defval(arrtype(E(e)))) 


Fig. 10: Formalization of statements. 


contract C { contract C { 
struct S ah Int Sen Y Societe S AG ant Se" lp 
S pis 52, 83; si a; 
function primitiveAssign() { constructor() { 
siz = ds isQex = 2) s3ix = 35 a.push(S(1)); 
(aliz. Soe xo s2 =) (33. 82.X, 51.7); S storage s = a[0]; 
fi ew == 3, 82 sx) ==" 1 sox. ==" 2 a.pop(); 
} assert(s.x == 1); // Ok 
function storageAssign() { // Following is error 
Giles i Ree Se clog & ei5 // assert(al0].x == 1); 
(ila S34 AP) = (e854 ses Biles } 
// alz; 82.2, 83.% are all equal to Í i 
p 
} 


Fig. 12: Example illustrat- 
ing a dangling pointer to 


Fig. 11: Example illustrating the right-to-left 
storage. 


assignment order and the treatment of refer- 
ence types in storage in tuple assignment. 


compiler first evaluates the RHS and LHS tuples (in this order) from left to right 
and then assignment is performed component-wise from right to left. 


Example 10. Consider the tuple assignment in function primitiveAssign( in 
Figure 11. From right to left, s2.x is assigned first with the value of s1.x which 
is 1. Afterwards, when s3.x is assigned with s2.x, the already evaluated (old) 
value of 2 is used instead of the new value 1. Finally, s1.x gets the old value 
of s3.x, i.e., 3. Note however, that storage expressions on the RHS evaluate 
to storage pointers. Consider, for example, the function storageAssign() in 
Figure 11. From right to left, s2 is assigned first, with a pointer to s1 making 
s2.x become 1. However, as opposed to primitive types, when s3 is assigned 
next, s2 on the RHS is a storage pointer and thus the new value in the storage 
of s2 is assigned to s3 making s3.x become 1. Similarly, s1.x also becomes 1 
as the new value behind s3 is used. 
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Array push increases the length and assigns the given expression as the last 
element. Array pop decreases the length and sets the removed element to its 
default value. While the removed element can no longer be accessed via indexing 
into an array (a runtime error occurs), it can still be accessed via local storage 
pointers (see Figure 12).1° 


3.5 Assignments 


Assignments between reference types in Solidity can be either pointer assign- 
ments or value assignments, involving deep copying and possible new allocations 
in the latter case. We use A(lhs, rhs) to denote the function that assigns a rhs 
SMT expression to a lhs SMT expression based on their original types and data 
locations. The definition of A(.,.) is shown in Figure 13. Value type assignments 
are simply mapped to an SMT assignment. To make our presentation more 
clear, we subdivide the other cases into separate functions for array, struct and 
mapping operands, denoted by Aa(.,.), Ag(.,.) and Ang(.,.) respectively. 


Mappings. As discussed previously, Solidity prohibits direct assignment of map- 
pings. However, it is possible to declare a storage pointer to a mapping, in which 
case the RHS expression is packed. It is also possible to assign two storage point- 
ers, which simply assigns pointers. Other cases are a no-op.!* 


Structs and arrays. For structs and arrays the semantics of assignment is sum- 
marized in Figure 14. However, there are some notable details in various cases 
that we expand on below. 

Assigning anything to storage LHS always causes a deep copy. If the RHS is 
storage, this is simply mapped to a datatype assignment in our encoding (with 
an additional unpacking if the RHS is storage pointer).'° If the RHS is memory, 
deep copy for structs can be done member wise by accessing the heap with the 
RHS pointer and performing the assignment recursively (as members can be 
reference types themselves). For arrays, we access the datatype corresponding 
to the array via the heap and do an assignment, which does a deep copy in 
SMT. Note however, that this only works if the base type of the array is a 
value type. For reference types, memory array elements are pointers and would 
require being dereferenced during assignment to storage. As opposed to struct 
members, the number of array elements is not known at compile time so loops or 
quantifiers have to be used (as in traditional software analysis). However, this is a 


13 The current version (0.5.x) of Solidity supports resizing arrays by assigning to 
the length member. However, this behavior is dangerous and has been since re- 
moved in the next version (0.6.0) (see https://solidity.readthedocs.io/en/v0.6.0/ 
060-breaking-changes.html). Therefore, we do not support this in our encoding. 

14 This is consequence of the fact that keys are not stored in mappings and so the 
assignment is impossible to perform. 

15 This also causes mappings to be copied, which contradicts the current semantics. 
However, we chose to keep the deep copy as assignments of mappings is planned to 
be disallowed in the future (see https://github.com/ethereum/solidity /issues/7739). 
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lhs, rhs) = lhs := rhs for value type operands 
lhs, rhs) = Am (lhs, rhs) for mapping type operands 
lhs, rhs) = Ag(lhs, rhs) for struct type operands 


Am(lhs: sp, rhs : s) = lhs := pack(rhs) 
Am (lhs : sp, rhs: sp) = lhs := rhs 


Am (ths, rhs) = {} (all other cases) 
As(lhs:s,rhs:s) = lhs := rhs 

As(lhs:s,rhs:m) = A(lhs.mi, structheapyype(pns) [Ths].mi) for each mi 
As(lhs:s,rhs:sp) = Ag(lhs, unpack(rhs)) 


( 

( 
As(lhs:m,rhs:m) = lhs := rhs 
As(lhs:m,rhs:s) = lhs := refent := refent +1 

A(structheapype(ins) [lhs]-mi, rhs.m;) for each m; 

As(lhs : m, rhs : sp) = Ag(lhs, unpack(rhs)) 
As(lhs : sp, rhs : s) = lhs := pack(rhs) 
As(lhs : sp, rhs : sp) = lhs := rhs 


Aa(lhs:s,rhs:s) = lhs := rhs 
Aa(lhs:s,rhs:m) = lhs := arrheapyype(rns) [Ths] 
Aa(lhs : s, rhs: sp) = Aa(lhs, unpack(rhs)) 
Aa(lhs:m,rhs:m) = lhs := rhs 
Aa(lhs:m,rhs:s) = lhs := refent := refent + 1 
arrheaPype(ins) [lhs] := rhs 
Aa(lhs : m, rhs : sp) = Aa(lhs, unpack(rhs)) 
Aa(lhs : sp, rhs : s) = lhs := pack(rhs) 

Aa(lhs : sp, rhs: sp) = lhs := rhs 


Fig. 13: Formalization of assignment based on different type categories and data 
locations for the LHS and RHS. We use s, sp and m after the arguments to 
denote storage, storage pointer and memory types respectively. 


special case, which can be encoded in the decidable array property fragment [13]. 
Assigning storage (or storage pointer) to memory is also a deep copy but in 
the other direction. However, instead overwriting the existing memory entity, a 
new one is allocated (recursively for reference typed elements or members). We 
model this by incrementing the reference counter, storing it in the LHS and then 
accessing the heap for deep copy using the new pointer. 


3.6 Expressions 


We use €(.) to denote the function that translates a Solidity expression to an 
SMT expression. As a side effect, declarations and statements might be intro- 
duced (denoted by [decl] and { stmt} respectively). The definition of €(.) is shown 
in Figure 15. As discussed in Section 3.4 we assume that side effects are added 
from subexpressions in the proper order and only once. 

Member access is mapped to an SMT member access by mapping the base 
expression and the member name. There is an extra unpacking step for storage 
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lhs/rhs Storage Memory Stor.ptr. 
Storage | Deep copy Deep copy Deep copy 
Memory| Deep copy Pointer assign Deep copy 
Stor.ptr.|Pointer assign Error Pointer assign 


Fig. 14: Semantics of assignment between array and struct operands based on 
their data location. 


E(id) = id 

E(expr.id) = E(expr).€ (id) if type(ezpr) = struct S storage 
E(expr.id) = unpack(E(expr)).€ (id) if type(ezpr) = struct S storptr 
E(expr.id) = structheap ,[E (expr)].€ (id) if type(expr) = struct S memory 
E(expr.id) = E(expr).€ (id) if type(ezpr) = T[] storage 
E(expr.id) = unpack(E(expr)).€ (id) if type(expr) = T[] storptr 
E(expr.id) = arrheap,;[E(expr)].€(id) if type(expr) = T[] memory 


E(expr[idr]) = E(expr).arr[E (idx)] if type(expr) = T[] storage 
E(expr[idz]) = unpack(E(expr)).arr[E(idx)] if type(expr) = T[] storptr 
E(exprlidr]) = arrheapņ [E (expr)].arr[E (idx)] if type(expr) = T[] memory 

E(exprlidr]) = E(expr)[E(idzx)] if type(ezpr) = mapping(K=>V) storage 
E(expr[idz]) = unpack(E(expr))[E (idx)] if type(ezpr) = mapping(K=>V) storptr 


E(cond ? expr : expry) = [varr : T(type(cond ? expr; : ezprp))] (fresh symbol) 
[varr : T(type(cond ? exprp : exrprp))] (fresh symbol) 
{A(varr, € (expry))} 
{A(varr, € (exprp))} 


ite(E(cond), varr, varr) 


E(new T[] (expr)) = [ref : int] (fresh symbol) 
{ref := refent := refent + 1} 
{arrheap,[ref|.length := E(expr)} 
{arrheap,[ref|.arr{i] := defval(T)} for 0 < i < E(expr) 
ref 
E(S(..., expr;,...)) = [ref : int] (fresh symbol) 
{ref := refent := refcnt + 1} 
{ structheap g|ref|.mi := E(expr;)} for each member m; 


ref 


Fig. 15: Formalization of expressions. We denote struct S members as m; with 
types Sj. 


pointers and a heap access for memory. Note that the only valid member for 
arrays is length. Index access is mapped to an SMT array read by mapping the 
base expression and the index, and adding en extra member access for arrays to 
get the inner array arr of elements from the datatype. Furthermore, similarly to 
member accesses, an extra unpacking step is needed for storage pointers and a 
heap access for memory. 
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Conditionals in Solidity can be mapped to an SMT conditional in general. 
However, data locations can be different for the true and false branches, causing 
possible side effects. Therefore, we first introduce fresh variables for the true 
and false branch with the common type (of the whole conditional), then make 
assignments using A(.,.) and finally use the new variables in the conditional. The 
documentation [30] does not specify the common type, but the compiler returns 
memory if any of the branches is memory, and storage pointer otherwise. 

Allocating a new array in memory increments the reference counter, sets the 
length and the default values for each element (recursively). Note that in general 
the length might not be a compile time constant,in which case setting default 
values could be encoded with the array property fragment (similarly to deep 
copy in assignments) [13]. Allocating a new memory struct also increments the 
reference counter and sets each value by translating the provided arguments. 


4 Evaluation 


The formalization described in this paper serves as the basis of our Solidity 
verification tool SOLC-VERIFY [20].1° In this section we provide an evaluation of 
the presented formalization and our implementation by validating it on a set of 
relevant test cases. For illustrative purposes we also compare our tool with other 
available Solidity analysis tools.!” 

“Real world” contracts currently deployed on Ethereum (e.g., contract avail- 
able on Etherscan) have limited value for evaluating memory model semantics. 
Many such contracts use old compiler versions with constructs that are not sup- 
ported anymore, and do not use newer features. There are also many toy and 
trivial contracts that are deployed but not used, and popular contracts (e.g. 
tokens) are over-represented with many duplicates. Furthermore, the inconsis- 
tent usage of assert and require [20] makes evaluation hard. Evaluating the 
memory semantics requires contracts that exercise diverse features of the mem- 
ory model. There are larger dApps that do use more complex features (e.g., 
Augur or ENS), but these contracts also depend on many other features (e.g. 
inheritance, modifiers, loops) that would skew the results. 

Therefore we have manually developed a set of tests that try to capture 
the interesting behaviors and corner cases of the Solidity memory semantics. 
The tests are targeted examples that do not use irrelevant features. The set 
is structured so that every target test behavior is represented with a test case 
that sets up the state, exercises a specific feature and checks the correctness 
of the behavior with assertions. This way a test should only pass if the tool 
provides a correct verification result by modeling the targeted feature precisely. 


16 SoLC-VERIFY is open source, available at https://github.com/SRI-CSL/solidity. Be- 
sides certain low-level constructs (such as inline assembly) SOLC-VERIFY supports 
a majority of Solidity features that we omitted from the presentation, including 
inheritance, function modifiers, for/while loops and if-then-else. 

17 All tests, with a Truffle test harness, a docker container with all the tools, and all indi- 
vidual results are available at https://github.com/dddejan/solidity-semantics-tests. 
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The correctness of the tests themselves is determined by running them through 
the EVM with no assertion failures. Test cases are expanded to use all reference 
types and combinations of reference types. This includes structures, mappings, 
dynamic and fixed-size arrays, both single- and multi-dimensional. 


The tests are organized into the following classes. Tests in the assignment 
class check whether the assign statement is properly modeled. This includes 
assignments in the same data location, but also assignments across data locations 
that need deep copying, and assignments and re-assignments of memory and 
storage pointers. The delete class of tests checks whether the delete statement 
is properly modeled. Tests in the init class check whether variable and data 
initialization is properly modeled. For variables in storage, we check if they are 
properly initialized to default values in the contract constructor. Similarly, we 
check whether memory variables are properly initialized to provided values, or 
default values when no initializer is provided. The storage class of tests checks 
whether storage itself is properly modeled for various reference types, including 
for example non-aliasing. Tests in the storageptr class check whether storage 
pointers are modeled properly. This includes checking if the model properly 
treats storage pointers to various reference types, including nested types. In 
addition, the tests check that the storage pointers can be properly passed to 
functions and ensure non-aliasing for distinct parts of storage. 


For illustrative purposes we include a comparison with the following avail- 
able Solidity analysis tools: MYTHRIL v0.21.17 [29], VERISOL v0.1.1-alpha [24], 
and SMT-CHECKER v0.5.12 [1]. MYTHRIL is a Solidity symbolic execution tool 
that runs analysis at the level of the EVM bytecode. VERISOL is similar to 
SOLC-VERIFY in that it uses Boogie to model the Solidity contracts, but takes 
the traditional approach to modeling memory and storage with pointers and 
quantifiers. SMT-CHECKER is an SMT-based analysis module built into the So- 
lidity compiler itself. There are other tools that can be found in the literature, 
but they are either basic prototypes that cannot handle realistic features we are 
considering, or are not available for direct comparison. 


We ran the experiments on a machine with Intel Xeon E5-4627 v2 @ 3.30GHz 
CPU enforcing a 60s timeout and a memory limit of 64GB. Results are shown in 
Table 1. As expected, MYTHRIL has the most consistent results on our test set. 
This is because MYTHRIL models contract semantics at the EVM level and does 
not need to model complex Solidity semantics. Nevertheless, the results also in- 
dicate that the performance penalty for this precision is significant (8 timeouts). 
VERISOL, as the closest to our approach, still doesn’t support many features and 
has a significant amount of false reports for features that it does support. Many 
false reports are because their model of storage is based on pointers and tries 
to ensure storage consistency with the use of quantifiers. SMT-CHECKER doesn’t 
yet support the majority of the Solidity features that our tests target. 


Based on the results, SOLC-VERIFY performs well on our test set, matching 
the precision of MYTHRIL at very low computational cost. The few false alarms 
we have are either due to Solidity features that we chose to not implement (e.g., 
proper treatment of mapping assignments), or parts of the semantics that we 
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Table 1: Results of evaluating MYTHRIL, VERISOL, SMT-CHECKER, and SOLC- 
VERIFY on our test suite. 


assignment (102)|correct incorrect unsupported timeout|time (s) 
MYTHRIL 94 0 0 8 1655.14 
VERISOL 10 61 31 0 175.27 
SMT-CHECKER 6 9 87 0 15.25 
SOLC-VERIFY 78 8 16 0 62.81 
delete (14) correct incorrect unsupported timeout|time (s) 
MYTHRIL 13 1 0 0 47.51 
VERISOL 3 8 3 0 24.66 
SMT-CHECKER 0 0 14 0 0.30 
SOLC-VERIFY 7 1 6 0 9.02 
init (18) correct incorrect unsupported timeout|time (s) 
MYTHRIL 15 3 0 0 59.67 
VERISOL 7 8 3 0 28.82 
SMT-CHECKER 0 0 18 0 0.41 
SOLC-VERIFY 13 5 0 0 11.88 
storage (27) correct incorrect unsupported timeout|time (s) 
MYTHRIL 27 0 0 0 310.40 
VERISOL 12 15 0 0 43.45 
SMT-CHECKER 2 0 25 0 1.32 
SOLC-VERIFY 27 0 0 0 17.61 
storageptr (164) [correct incorrect unsupported timeout time (s) 
MYTHRIL 164 0 0 0 1520.29 
VERISOL 128 19 17 0 203.93 
SMT-CHECKER 4 18 142 0 21.93 
SOLC-VERIFY 164 0 0 0 96.92 


only implemented partially (such as deep copy of arrays with reference types 
and recursively initializing memory objects). There are no technical difficulties 
in supporting them and they are planned in the future. 


5 Related Work 


There is a strong push in the Ethereum community to apply formal methods 
to smart contract verification. This includes many attempts to formalize the 
semantics of smart contracts, both at the level of EVM and Solidity. 


EVM-level semantics. Bhargavan et al. [11] decompile a fragment of EVM to F*, 
modeling EVM as a stack based machine with word and byte arrays for storage 
and memory. Grishchenko et al. [19] extend this work by providing a small 
step semantics for EVM. KEvM [21] provides an executable formal semantics of 
EVM in the K framework. Hirai [22] formalizes EVM in Lem, a language used by 
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some interactive theorem provers. Amani et al. [2] extends this work by defining 
a program logic to reason about EVM bytecode. 


Solidity-level semantics. Jiao et al. [23] formalize the operational semantics of 
Solidity in the K framework. Their formalization focuses on the details of bit- 
precise sizes of types, alignment and padding in storage. They encode storage 
slots, arrays and mappings with the full encoding of hashing. However, the for- 
malization does not describe assignments (e.g., deep copy) apart from simple 
cases. Furthermore, user defined structs are also not mentioned. In contrast, our 
semantics is high-level and abstracts away some details (e.g., hashes, alignments) 
to enable efficient verification. Additionally, we provide proper modeling of dif- 
ferent cases for assignments between storage and memory. Bartotelli et al. [10] 
propose 'TINYSOL, a minimal core calculus for a subset of Solidity, required to 
model basic features such as asset transfer and reentrancy. Contract data is mod- 
eled as a key value store, with no differences in storage and memory, or in value 
and reference types. Crafa et al. [15] introduce Featherweight Solidity, a calculus 
formalizing core features of the language, with focus on primitive types. Data 
locations and reference types are not discussed, only mappings are mentioned 
briefly. The main focus is on the type system and type checking. They propose an 
improved type system that can statically detect unsafe casts and callbacks. The 
closest to our work is the work of Zakrzewski [33], a Coq formalization focusing 
on functions, modifiers, and the memory model. The memory model is treated 
similarly: storage is a mapping from names to storage objects (values), memory is 
a mapping from references to memory objects (containing references recursively) 
and storage pointers define a path in storage. Their formalization is also high- 
level, without considering alignment, padding or hashing. The formalization is 
provided as big step functional semantics in Coq. While the paper presents some 
example rules, the formalization does not cover all cases. For example the details 
of assignments (e.g., memory to storage), push/pop for arrays, treating memory 
aliasing and new expressions. Furthermore, our approach focuses on SMT and 
modular verification, which enables automated reasoning. 


6 Conclusion 


We presented a high-level SMT-based formalization of the Solidity memory 
model semantics. Our formalization covers all aspects of the language related to 
managing both the persistent contract storage and the transient local memory. 
The novel encoding of storage pointers as arrays allows us to precisely model non- 
aliasing and deep copy assignments between storage entities without the need 
for quantifiers. The memory model forms the basis of our Solidity-level modular 
verification tool SOLC-VERIFY. We developed a suite of test cases exercising all 
aspects of memory management with different combinations of reference types. 
Results indicate that our memory model outperforms existing Solidity-level tools 
in terms of soundness and precision, and is on par with low-level EVM-based 
implementations, while having a significantly lower computational cost for dis- 
charging verification conditions. 
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Abstract. A key open problem with multiparty session types (MPST) 
concerns their expressiveness: current MPST have inflexible choice, no 
existential quantification over participants, and limited parallel compo- 
sition. This precludes many real protocols to be represented by MPST. 
To overcome these bottlenecks of MPST, we explore a new technique 
using weak bisimilarity between global types and endpoint types, which 
guarantees deadlock-freedom and absence of protocol violations. Based 
on a process algebraic framework, we present well-formed conditions for 
global types that guarantee weak bisimilarity between a global type and 
its endpoint types and prove their check is decidable. Our main practical 
result, obtained through benchmarks, is that our well-formedness condi- 
tions can be checked orders of magnitude faster than directly checking 
weak bisimilarity using a state-of-the-art model checker. 


1 Introduction 


Background. To take advantage of modern parallel and distributed comput- 
ing platforms, message-passing concurrency is becoming increasingly important. 
Modern programming languages, however, offer insufficiently effective linguistic 
support to guide programmers towards safe usage of message-passing abstrac- 
tions (e.g., to prevent deadlocks or protocol violations). 

Multiparty session types (MPST) [34] 


constitute a static, correct-by-construc- G 

tion approach to simplify concurrent $ project global 

programming, by offering a type-based Ja oe a 

framework to specify message-passing Lı || Lə) .. | Ln 

protocols and ensure deadlock-freedom Pec ape ae 

and protocol conformance. The idea is | | L | P; against 
7 local type L; 

to use behavioural types [1,37] to en- P,||Po| .. | Pr 

force protocols (i.e., patterns of admissi- 

ble communications) between roles (e.g., Fig. 1: MPST framework 


threads, processes, services) to avoid con- 

currency bugs. The framework is illustrated in Fig. 1: first, a global type G (pro- 
tocol specification; written by the programmer) is projected onto every role; then, 
every resulting endpoint type (local type) L; (role specification) is type-checked 
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with the corresponding process P; (role implementation). If every process is well- 
typed against its local type, then their parallel composition is guaranteed to be 
free of deadlocks and protocol violations relative to the global type. Notably, 
common concurrency bugs as sends without receives, receives without sends, 
and type mismatches (actual type sent vs. expected type received) are ruled 
out statically. The MPST framework is language-agnostic: in recent years, prac- 
tical implementations of MPST have been developed for several programming 
languages, including Erlang, F#, Go, Java, and Scala [18,35,36,45,46,50]. 


Three open problems. Many practically relevant protocols cannot be spec- 
ified as global types; this limits MPST’s applicability to real-world concurrent 
programs. Specifically, while the original work [33] has been extended with sev- 
eral advanced features (e.g., time [7,44], security [11,12,13,17], and parametrisa- 
tion [18,25,47]), core features still have significant restrictions: inflexible choice, 
no existential quantification over participants, and limited parallel composition. 

1. Inflexible choice: In the original work [33], if there is a choice between 
multiple branches, the sender in the first communication of each branch must be 
the same, the receiver must be the same, and the message type must be different 
(i.e., no non-determinism). Moreover, each role not involved in the first commu- 
nication of each branch, must have the same behaviour in each continuation. For 
instance, the following global type specifies a protocol where Client c repeatedly 
requests an arithmetic Server s to compute the sum or product of two numbers: 


WX. [ce s: Add s—ec:Sum- X]+[c—>s:Mul-s—c:Prod- X]] 


Here, c— s: Add specifies a communication of an Add-message (with two numbers 
as payload) from the Client to the Server, while - and + specify sequencing and 
branching, and square brackets indicate operator precedence. This is a “good” 
global type that satisfies the conditions. In contrast, the following “bad” global 
type specifies a protocol where Client c repeatedly requests addition and multi- 
plication Servers sı and s2 via Router r (payload types omitted; rı > r2 —r3:t 
abbreviates rı > 1rg:t- r2 —r3:t): 
WX. [crs : Add - sı > c: Sum- X] +[c—r—s2:Mul -S2 — c: Prod - x] 

Several improvements to the original work have been proposed: Honda et al. 
managed to allow each role r not involved in a choice to have different behaviour 
in different branches [15], so long as r is made aware of which branch is chosen ina 
timely and unambiguous fashion (e.g., the previous global type is still forbidden), 
while Lange et al., Castagna et al., and Hu & Yoshida managed to allow choices 


between different receivers [16,23,36,40]. For instance, the following global type 
(the Client directly requests the specialised server) is allowed: 


uX. [[c—s : Add - sı >c: Sum ; X] +[c—s2 :Mul-sg—>c:Prod- x] 
But, the following global type (two Clients cı and c3 use Server S) is forbidden: 
[cı >s: Add -s— c1 : Sum- X] +[cı >s: Mul - s— c1 : Prod - X]+ 
pes [c2 —> s: Add -Ss— C2 : Sum: X] +[c2—s: Mul : s— c9 : Prod - X] 
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None of the existing works allow the above nondeterministic choices between 
different senders. We call this the +-problem: how to add a choice constructor, 
denoted by +, to specify choices between disjoint sender-receiver-label triples? 

2. No existential quantification: Related to the +-problem is the 4- 
problem: how to add an existential role quantifier, denoted by J, to specify 
the execution of 4’s body for some role in 4’s domain? For instance, instead 
of writing a separate global type for 2 Clients, 3 Clients, etc., existential role 
quantification allows us to write only one global type for any n>1 Clients: 


uX. dre{e; | 1<i<n}.|[r—+s:Add-s—+r:Sum-X]+[r—>s:Mul-s—r:Prod-X]] 


The 3-problem was first formulated by Deniélou & Yoshida [22] as the dual of the 
Y-problem (i.e., specify the execution of V’s body for each role in V’s domain): 
the V-problem was solved in the same paper, but the 4-problem “raises many 
semantic issues” [22] and has remained open for almost a decade. 

3. Limited parallel composition: The third open problem related to 
choice is the ||-problem: how to add a constructor, denoted by ||, that allows 
infinite branching (i.e., non-finite control) through unbounded parallel inter- 
leaving? While extensions of the original work with parallel composition exist 
(e.g., [16,22,23,43]), none of these works supports unbounded interleaving. For 
instance, the following global type allows an unbounded number of requests to 
be served by the Server in parallel (instead of sequentializing them): 


ux. dref{e; | 1<i<n}. [|r > s:Add -[sr:Sum | X]|+[r—s:Mul -[sr:Prod | XI] 


Contributions. We overcome these three bottlenecks of MPST with an ap- 
proach based on three key novelties: first, we have a new definition of projection 
that keeps more information in the local types than existing definitions; second, 
we exploit this extra information to formulate our well-formedness conditions; 
third, we use an unexplored proof method for MPST, namely to prove the op- 
erational equivalence between a global type and its projections modulo weak 
bisimilarity. This makes the proofs cleaner and ultimately allows for more flex- 
ibility (e.g., our approach can be modularly combined with traditional session 
type checking, but potentially also with other verification methods, such as model 
checking or conformance testing). To summarise the highlights: 


— For the first time, we provide solutions to the +-problem, the J-problem, 
and the ||-problem, by presenting expressive syntax for global and local types 
(formulated as process algebraic terms), a refined notion of projection, and 
novel well-formedness conditions. 

— Our main theoretical result is operational equivalence: a well-formed global 
type behaves the same as the parallel composition of its projections, modulo 
weak bisimulation. This implies freedom of deadlocks and freedom of protocol 
violations of the projections. Checking this equivalence is decidable. 

To our knowledge, we are the first to use (weak) bisimilarity to prove the 
correctness of a projection operator from global to local types. By doing so, 


254 S. Jongmans and N. Yoshida 


Client 1 Server Client 2 Client 1 Server Client 2 


T ie 


4: Unlock 


(b) Invalid execution 


1 | 
1 | 
| | 
1 | 
| | 
| 1 
I | 
| | 
| | 
1 | 
1 1 
| | 
eelt) i 
| | 
| | 
| | 
| | 
1 | 
| 1 
| 1 
| | 
1 | 
1 | 
| | 
| | 
| | 


8: Value( “x” 5} lent aci 
, 9: Barrier ;——_Lock__y | 
! en l 
10: Value( “y”, 7, o af) y 
iL Se 13 1__Set("x’, 42) | 
|_ 12: Unlock 1 gValue(“x", 42) _ | 
Į l 
(a) Valid execution (c) Invalid execution 


Fig. 2: Example executions of the Key-Value Store protocol 


we decouple (a) the act of reasoning about projection and (b) the act of 
establishing compliance between local types and process implementations; 
until our work, these two concerns have always been conflated. 

— Our main practical results are: (1) to provide representative protocols ty- 
pable in our approach; and (2) the well-formedness conditions of (1) can be 
checked orders of magnitude faster than directly checking weak bisimilarity 
using mCRL2 [10,20,29], a state-of-the-art model checker. 


In Sect. 2, we present an overview of our contribution through a representative 
example protocol that is not supported by previous work. In Sect. 3, we present 
the details of our theoretical contribution. In Sect. 4, we present the details of our 
practical contribution (implementation and evaluation). In Sect. 5, we discuss 
related work. We conclude and discuss future work in Sect. 6. 

Detailed formal definitions and proofs of all lemmas and theorems can be 
found in our supplement [38]. 


2 Overview of our Approach 


Scenario. To highlight our solutions to the +-problem, 4-problem, and ||- 
problem, we consider a Key- Value Store protocol, similar to those used in modern 
NoSQL databases [21,27]. Specifically, our Key-Value Store protocol is inspired 
by the transaction mechanism of the popular Redis database [48,49]. This pro- 
tocol is not supported by any of the existing MPST works. 

The Key-Value Store protocol consists of n Clients that require access to the 
store, represented by role names c1, ..., Cn, and one Server that provides access to 
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the store, represented by role name s. The store has keys of type Str (strings) and 
values of type Nat (numbers). Fig. 2 shows valid and invalid example executions 
of the protocol (n=2) as message sequence charts; it works as follows. 

First, a Lock-message is communicated from some Client c; (1<i<n) to Server 
s (Fig. 2a, arrows 1, 5); this grants c; exclusive access to the store. Then, a 
sequence of messages to write and/or read values is communicated: 


— To write, a Set-message is communicated from c; to s (arrows 2, 3, 11). 

— To read, a Get-message is communicated from c; to s (arrows 6, 7). Then, 
eventually, a Value-message is communicated from s to c; (arrows 8, 10), but 
in the meantime, additional Get-messages can be communicated from c; to 
s. In this way, the Client does not need to await the responses of the Server 
to perform multiple independent requests. To indicate enough Get-messages 
have been sent, a Barrier-message is communicated from c; to s (arrow 9), 
which serves as a communication fence: the protocol will only proceed once 
all Value-messages for pending Get-messages have been communicated. 


The sequence ends with the communication of an Unlock-message from c; to s 
(arrow 12). The protocol is then repeated for some Client cj (1<j<n); possibly, 
but not necessarily, i=j. In this way, the Server atomically processes accesses to 
the store between Lock/Unlock-messages. 


Global and local types. The corresponding global type and local types, in- 
ferred via projection (for some n), are as follows: 


G = uX. are{c; | l<i<n}.r—s:Lock- 
[HZ. [|r > s:Get(Str) -[sr:Value(Str, Nat) || Z]| +r—s: Barrier] . Y| 


Y. 
á +[r —s:Set(Str, Nat) -Y]+[r—s:Unlock- X] 


Lc, = uX. cis! Lock- 

[uzZ. [[cis! Get(Str) -[sc; ? Value(Str, Nat) || Z] +cis! Barrier] . Y] 
+[cis!Set(Str, Nat) -Y]+[cjs!Unlock- X] 

Ls = uX. Jre{c; | 1 <i <n}. rs? Lock- 

[uzZ. [[rs? Get(Str) -[sr!Value(Str, Nat) || Z\|+ rs? Barrier] : Y| 
+ [rs? Set(Str, Nat) - Y] + [rs? Unlock - X] 


uY. 


Global type rı—r2:l(t) specifies the communication of a message labelled £ 
with a payload typed t from sender rı to receiver r2; global type G1 -G2 speci- 
fies the sequential composition of global types Gi and G2; global type G1 + G2 
specifies the alternative composition (choice) of global types G4 and G3; global 
type IrE{r1, ..., Tn}. G specifies the existential role quantification over domain 
{ri,-..Tn} (ie., the alternative composition of G[r;/r] and ... and G[r,,/r], where 
G[r;/r] denotes the substitution of r; for every r in G); global type G4 || G2 speci- 
fies the interleaving composition of G1 and G2 (free merge [4]); global type LX. G 
specifies recursion (i.e., X is bound to uX.G in G). 
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Local type rır2! l(t) specifies the send of a ¢(t)-message through the channel 
from rı to r2; dually, local type rır2? L(t) specifies a receive. Because every 
Client participates in only one branch of the quantification, their local types do 
not contain J under the recursion. In contrast, because the Server participates 
in all branches, Ls does contain 4 under the recursion. 

By Thm. 3, G and the parallel composition of Lc,, ..., Lc,,, Ls are opera- 
tionally equivalent (weakly bisimilar), which in turn implies deadlock-freedom 
and absence of protocol violations. Note also that our global type for the Key- 
Value Store protocol indeed relies on solutions to the +-problem (choice between 
multiple clients that send a Lock-message), the 4-problem (existential quantifica- 
tion over clients), and the ||-problem (unbounded interleaving to support asyn- 


? 


chronous responses of a statically unknown number of requests). 


3 An MPST Theory with +, 3, and || 


3.1 Types as Process Algebraic Terms 


We define our languages of global and local types as algebras over sets of (global) 
communications and (local) sends/receives. This subsection presents preliminar- 
ies on the generic algebraic framework we use, based on the existing algebras 
PA [3] and TCP+REC [2]; the next subsection presents our specific instantia- 
tions for global and local types. 

Let A denote a set of actions, ranged over by a, and let {X1, X2,...,Y,...} 
denote a set of recursion variables. Then, let TERM(A) denote the set of (alge- 
braic) terms, ranged over by T, generated by the following grammar: 


T:= | a | T,+T> | Tı -To | Tı || Tə | N | (Xr |{X; > Ti hier) (ke TI) 


Term 1 specifies a skip; the grey background indicates it should not be 
explicitly written by programmers (but it is used only implicitly in the oper- 
ational semantics). Term a specifies an atomic action from A. Terms Ti + T, 
T,-T2, and Tı || Tə specify the alternative composition, the sequential composi- 
tion, and the interleaving composition (free merge [4]; a form of parallel com- 
position without interaction between the operands) of T} and Tz. Terms X and 
(Xpl {Xi > Ti jier) specify recursion, where {X; > T;}icr is a recursive speci- 
fication that maps recursion variables to terms, Xx is the initial call (for Tp), 
and every X; that occurs in Tę is a subsequent recursive call (for Tj); we write 
uX.T instead of (X|{X > T}. 

Let X — TERM(A) denote the set of all recursive specifications (i.e., ev- 
ery recursive specification is a partial function), ranged over by EF, F, and let 
sub(£, 7’) denote the simultaneous substitution of term E(X) for each recursion 
variable X in T. Fig. 3 defines the operational semantics of terms. It consists of 
two components: relation — defines reduction of terms, while relation | defines 
successful termination of terms. In words, term Tı + T2 is reduced by reducing 
either T; or To; term T; -Th is reduced by reducing first T) and then 75; term 
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T, > Ti Til Te OT TZN tS 
aS Aho mh T1hOT KNt+hOoT Nt+hOoT 
T > Tj Ts > TS sub(E, E(X)) > T 
TRTI T1 || Te + T |T (X|E) >T 
(a) Reduction 


Tit To} Til Tet Til To} sub(E,E(X))} 
ig Ti+} M4+Teal TiTa} Tı || ToL (X|E) J) 


(b) Termination 


Fig. 3: Operational semantics of terms 


T; || Tə is reduced by reducing Tı and T interleaved; and term (X |E} is reduced 
by reducing the version of F(X) where recursion variables have been substituted. 
A term is 1-free if it has no occurrences of 1. A term is closed if it has 
no occurrences of free recursion variables. A term T is deterministic if (1) for 
every action a, there exists at most one term T” such that T can reduce to T’ 
by performing a, and (2) every term to which T can reduce is deterministic as 
well. Henceforth, we consider only 1-free, closed, and deterministic terms. 

We note that (A, +, -, ||) is the signature of PA [3], while (1, A, +,-+, ||, X, |-)) 
is a subsignature of TCP+REC [2]. As the operational semantics of terms in 
TERM(A) coincides with the operational semantics of terms in (the correspond- 
ing subalgebra of) TCP+REC, our languages of global and local types inherit 
TCP+REC’s sound and complete aziomatisation, used in our tool (Sect. 4.1). 


3.2 Global Types and Local Types 


Actions. We instantiate TERM(A) to obtain languages of global and local types 
by defining action sets for (global) communications and for (local) sends/receives. 
Let R = {a,b,...$ denote the set of all role names, ranged over by r. Let 
„AB = {Lock, Get, ...} denote the set of all labels, ranged over by £. Let T = 
{Nat, Bool,...} denote the set of all payload types, ranged over by t. Let U = 
AB x T denote the set of all message types, ranged over by U; we write L(t) 
instead of (¢,t). Finally, let A, and A; denote the sets of all (global) communi- 
cations and (local) sends/receives, ranged over by g and l, generated by: 


g n= ry re:U (if: rı Æ r2) 
l = riıra!U | ryrg?U | ef. (if: rı Arg and rı Ær Æ rə) 


Tira 


Global action rı —rz2:U specifies the communication of a U-message from 
sender rı to receiver r2; we note that communications are synchronous, as actions 
in the underlying algebra are indivisible [2,3], but asynchrony can be encoded 
(Exmp. 1, below). Local action rır2!U specifies the send of a U-message through 
channel rır (from rı to r2). Dually, local action rır? U specifies a receive. Local 
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1,rı—r2:U) ifs r € {ri,r2} 


(rı—r2:U, 1) otherwise 


1G] - G2) if: split(r, G1) = (G1, G7) and GY 41 
Gı -G2,G3) if: split(r, G1) = (G41, GY) and GY = 1 and 
split(r, G2) = (Gb, Gz) and Gy 4 1 
Gı- G2, 1) otherwise 


split(r, rı >r2:U) = 


split(r,Gi-G2) = 


M ~~ G_ split(r2,G) = (G’,G”) 
GaG rı>rz:U- M ~ rı =rir2:U -[rir2—=r2:U || G] G” 


(asynchrony) 


Mk ~ Gk L{Mi}ier {ry ~G kel 
o~n L{Misier ~ Gk +G 


(n-ary choice) 


TENANE (finite recursion: base) 


Mi = H(X, Le, be, { (r15; r23, Mj) jeri) foralli € 7 
Z {rig rai: le: Mi- X]+[ri:— r2; : le- Mi jier ~G 
H(X, be, Le, {(r1i, 721, Gi) jier) ine wx, G 


(finite recursion: step) 


L{M[ri/r]}ier ~~ G 
Sare{rifier.M ~ G 


(existential role quantification) 


Fig. 4: Macros 


action €;.,,, Specifies the idling of role r during a communication between roles 
rı and rg. The inclusion of such annotated idling actions in local types is novel; 
we shortly elaborate on its purpose. 

We can now define GLOB = TERM(A,) and Loc = TERM(A,) as the sets of 


all global and local types, ranged over by G and L. 


Macros. Asa testimony to the unique expressive power of our language of global 
types, we extend it with a number of macros that can be expanded to “normal” 
global types in GLoB. A macro M is generated by the following grammar: 


M ::= G € GLOB | rı —>rz: M | L{Mih}ier | 
WA, le, le, {Tii Pais Mi) pier) | Ire{rihier. M 


Degenerate “macro” G is a normal global type; it is part of the grammar to nest 
global types inside macros. Macro rı >r: M specifies an asynchronous commu- 
nication from sender rı to receiver ro. Macro X{M;};c7 specifies an n-ary choice 
among |I| alternatives. Macro H(X, le, le, { (114, r2i, Mi) Jier) specifies finite re- 
cursion: at the start of each unfolding of recursion variable X, for some i € J, 
either an &.-message is communicated from sender rı; to receiver r2; (in which 
case they continue their participation in the recursion), or an €.-message is com- 
municated (in which case they exit). Macro dr€{r;}icer.M specifies existential 
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role quantification. Macros can be nested. Slightly abusing notation, we allow 
macros to occur and be expanded freely in “normal” global types. 

Fig. 4 defines the macro expansion rules. We note that the left-hand side of ~ 
is a macro, while the right-hand side is a normal global type. We demonstrated 
existential role quantification in Sect. 2; below, we give two more examples to 
illustrate our encoding of asynchronous communication and finite recursion. 


Example 1 (Asynchrony). Although communications are synchronous, we can 
encode asynchrony by representing buffered channels (unordered, as in asyn- 
chronous z-calculus [32]) explicitly as roles that participate in a protocol. To 
this end, assume for all r,,r2 € R, there exists a role rır € R as well (to 
represent the buffer from rı to r2); alternatively rır could be any fresh name. 

The following global types (message types omitted) specify paradigmatic 
cases for protocols with asynchronous communications: 


Mı =a—~>b. 1 ~> Gi = aab- ab—b 
Mə = a—>b-a—>b- i ~~ G2 =a >ab- [ab >b || a >ab|: ab > b 
M3 =a-b-b-a- 1 ~ G3 = a—ab- ab— b - b— ba - ba—a 


M4 = a—>b:a—b ~ G4 = a-—ab-ab~b-a—>b 


(For brevity, we omit 1 from the resulting global types; this can be incorporated 
in the macro expansion rules, at the expense of a more complex formulation.) 
Global type G, specifies an asynchronous communication from Alice to Bob. 
Global type G2 specifies two asynchronous communications from Alice to Bob; 
Alice can do the second send already before Bob has done the first receive. 
Global type G3 specifies an asynchronous communication from Alice to Bob, 
followed by one from Bob to Alice; in contrast to G2, Bob can send only after 
he has received (i.e., this encoding of asynchrony preserves causality of messages 
sent and received by the same role). Global type G4 specifies an asynchronous 
communication from Alice to Bob, followed by a synchronous communication 
from Bob to Alice; it highlights that, unlike existing languages of global types, 
ours supports mixing synchrony and asynchrony in a single global type. 


Example 2 (Finite recursion). The Key-Value Store protocol in Sect. 2 does not 
terminate: in its global type, the inner recursions (Y and Z) can be exited, but 
the outer recursion (X) cannot. A version of this protocol that terminates once 
each of the Clients has indicated it has finished using the store (e.g., by sending 
an Exit-message) can also be specified. 

We illustrate the key idea in a simplified example: 


Gy = LX. [[a—c:Con . X| +a—c: Exit] Gə = uX. [[b—c:Con . X]=+ b—c: Exit] 
G = uX. [[a—c:Con . X]+[a—c: Exit- G2]|+[b— c: Con - X|+[b— c: Exit . G4] 
Global type G specifies the communication of either a Con-message (to continue 


the recursion) or an Exit-message (to break it) from Alice to Carol. Global type 
Gz is similar. Global type G specifies the communication of a Con-message from 
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L(r)} Liri) ryrg!U i ied ryrg?U Li, Lr) Eria L. 
forall r € dom £ EE, an 
Z i — L= L[ri 6 Li, ro Lal ESS Lifre L] 


(a) Termination (b) Reduction 
Fig. 5: Operational semantics of groups of local types 


Tîr=T if:Ge{T}UX anit neren 
(Gi * Go) |r = (Gi [ r) * (G2 Jr) rı—=r2:U Îr = 4$ rır2?U iftr1 Ar=re 


if: x € {+,-, ||} Erit if: rı Ar Are 
(X|E)[r=(X|Etr) Eîr={X» E(X) îr |X € dom £} 
Gi R={reGl[r|reR} if: (G)CRFAO 


Fig. 6: Projection 


either Alice or Bob to Carol, or an Exit-message. In the latter case, Carol stops 
communicating with a role, while she proceeds communicating with the other 
role. Thus, the communications between Alice and Carol, and between Bob and 
Carol, are decoupled (i.e., decisions to continue or break recursions are made per 
role). Macro ff generalizes this pattern to arbitrary recursion bodies. 


Groups. Finally, let R — Loc denote the set of all groups of local types (i.e., 
every group is a partial function from role names to local types), ranged over 
by £. The idea is that while a global type specifies a protocol among n roles 
from one global perspective, a group of local types specifies a protocol from the 
n local perspectives. Fig. 5 defines the operational semantics of groups, built on 
top of the operational semantics of local types; we use the f[a +> y] notation 
to update function f with entry x > y. In words, group £ is reduced either by 
synchronously reducing the local types of a sender rı and a receiver rg (yielding 
a communication from rı to r2), or by reducing the local type of an idling role. 


3.3 End-Point Projection: from Global Types to Local Types 


A key part of MPST (Fig. 1) is a projection operator that consumes a global 
type G as input and produces a group of local types £ as output; it is correct if, 
under certain well-formedness conditions, G and £ are operationally equivalent. 

Let r(G) denote the set of all role names that occur in G. Fig. 6 defines 
our projection operator. In words, the projection of a communication rı >r2:U 
onto a role r is a send rır2!U if the role is sender in the communication, a 
receive rır2?U if it is receiver, or an idling action e7,,., if it is not involved; 
the projections of all other forms of global types onto r are homomorphic; the 
projection of a global type onto a set of roles R is the corresponding group of 
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TL TTI TST Tors” TPSA p TST 
TY TY TST’ TST" TST" TST 


(a) Termination (b) Reduction 


Fig. 7: Weak operational semantics; T, T’, T” € GLOB U Loc U (R — Loc) 


projections, where the side condition implies that the group is nonempty and 
contains a local type for at least every role name that occurs in G. Thus, a group 
of projections of G is a partial function relative to the set of all roles R, but it 
is total relative to the set of roles r(G) C R that occur in G. (We note that we 
also continue to assume global types are 1-free, closed, and deterministic.) 

Our projection operator is similar to existing projection operators in the 
MPST literature [34], but it also differs on a fundamental account: it produces 
local types with annotated idling actions. These idling actions will be instrumen- 
tal in the definition of our well-formedness conditions. We note that no idling 
actions occur in the local types for the Key-Value Store protocol in Sect. 2. This 
is because after the idling actions have been used to establish well-formedness, 
they are of no more use and can be eliminated to simplify the local types. 

The following lemmas state key properties about termination and reduction 
behaviour of global types and their projections: Lem. 1 states projection is sound 
and complete for termination; Lem. 2 states the same for reduction. 


Lemma 1. [GL implies (G |r) +] and (G |r) | implies GI] 


Proof. By induction on G. 
Lemma 2. [G = G’ implies (G | r) 25 (G' îr)] 
and |(G |r) I", T implies [G Z, @' and L = G' | r] for some G']] 


Proof. Both conjuncts are proven by induction on the structure of G, also using 
Lem. 1 (needed because termination plays a role in reduction of -). 


3.4 Weak Bisimilarity of Global Types, Local Types, and Groups 


The idling actions introduced in local types by our projection operator are inter- 
nal, because they never compose into communications that emerge between local 
types in groups. Therefore, the operational equivalence relation under which we 
prove the correctness of projection should be insensitive to idling actions. 
First, let A; = {e7,,, | r1 Æ r2 and rı Ar Æ r2} denote the set of all in- 
ternal actions, ranged over by 7,ø. Second, Fig. 7 defines an extension of our 
operational semantics (Fig. 3) with relations that assert weak termination and 
weak reduction (i.e., versions of termination and reduction that are insensitive to 
internal actions). Third, Fig. 8 defines weak bisimilarity (~), in terms of weak 
similarity (<), in terms of weak termination and weak reduction; it coincides 
with the definition found in the literature (e.g., [2]), with the administrative 


262 S. Jongmans and N. Yoshida 


[Ti < T; and T> => Tə] for some T] 


T, | implies T |) or [T] < Tz and a € A-] R, RICZ 
for all T > Ti TORT 
Tı x To Tı © To 


Fig. 8: Weak operational equivalence; Tı, T1, T2, T E€ GLOB U Loc U (R — Loc) 


exception that we need the fourth rule in Fig. 7b to account for the fact we 
have multiple different internal actions. We use a double horizontal line in the 
formulation of rules to indicate they should be applied coinductively. 

The notion of weak reduction allows us to generalize the soundness and com- 
pleteness of projection from roles (Lem. 2) to groups of roles: Lem. 3 states (1) 
if G can g-reduce to G’ and the projection of G’ is defined, then the group of 
projections of G can reduce to the group of projections of G”, either directly or 
with a trailing weak 7-reduction; (2) conversely, if the group of projections of G 
can g-reduce to £’, then G can g-reduce to G” and either £’ equals the group of 
projections of G”, or it can get there with a weak reduction. 


(G Ñ R) > (G' R) or 
(GIT R) 4 £’ 5 (G' ÑR) 


Lemma 3. 


GG and |, . 
f implies 
G |} R is defined 
for some L',T 
ae L'=G' || Ror 
G => G and r 


and |(G |! R) > L’ implies L' => (G’ ÎR) 


for some G’,T 


Proof. Both conjuncts are proven by induction on R, also using Lem. 2. 


3.5 Well-formedness of Global Types 
In general, projection does not preserve weak operational semantics. 


Example 8 (Bad protocols). The following global types (message types omitted) 
specify “bad” protocols that do not permit “good” concurrent implementations: 


G, =a—>b+a—c Gə = a—>b:c—d 
ab!+ac! ab?+e?, Sp +ac? abl«e2, ab?-e2, eS,<ed! e% +ed? 
=[{— eee SNS ee — 
Gila Gitb Gile Geta Geb Gale Getd 


Global type G specifies a communication from Alice to either Bob or Carol, 
chosen by Alice. This is a bad protocol, because if Alice chooses Bob, there is no 
way for Carol to know (and vice versa): Carol cannot locally distinguish between 
whether Alice has not made her choice yet, or whether Alice has chosen Bob. 
Formally, this is manifested in the fact that Carol’s local type can at any time 
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choose to perform idling action e$, (i.e., local type Gi [| c has two reductions, 
neither one of which has priority), thereby assuming that Alice has chosen Bob. 
However, Bob can symmetrically assume that Alice has chosen Carol, As a result, 
the group projection can reduce as follows: G4 I {a,b,c} sibs jen a Now, 
La cannot reduce further, but Alice has not terminated yet. This sequence of 
reductions cannot be (weakly) simulated by G4. 

Global type G2 specifies a communication from Alice to Bob, followed by a 
communication from Carol to Dave. This is a bad protocol, because there is no 
way for Carol and Dave to know when the communication from Alice to Bob 
has occurred. Formally, this is manifested in the fact that Carol’s and Dave’s 
local types can at any time choose to perform idling actions, thereby assuming 
that the communication from Alice to Bob has occurred. As a result, the group 
projection can reduce as follows: G2 Na, b, c, d} ab Ly =a fia fe 
L4. This sequence cannot be (weakly) simulated by G2. 


Next, we define two well-formedness conditions that invalidate the previous 
examples; in Sect. 3.6, we prove that if these conditions are satisfied by a global 
type G, it is indeed guaranteed that G and G |} R are operationally equivalent 
(i.e., weakly bisimilar). Instead of defining the conditions in terms of global types, 
we define them in terms of projections (i.e., local types). Informally: 


C For every r € R, for every choice that local type G |r has between a weak 
reduction = (where / is a send, a receive, or an idling action) and a com- 
pletely unobservable weak reduction =, choosing to perform the former 
does not disable the latter, and vice versa. This can be thought of as a form 
of commutativity between | and T. 

EC For every r € R, one of the following is true: 

1. For every every weak reduction +. that local type G Îr can perform 
(where / is a send or a receive, but not an idling action), it can perform 
a reduction —+. That is, if G |r can perform / in the future after idling 
actions, it can do l already eagerly in the present. 

2. Local type G Îr is the start of a causal chain: a sequence of 7T-reductions, 
followed by a non-7-reduction, that are “causally related” to each other. 
An é7,,,-reduction is causally related to a £; „reduction iff {r1,r2} N 
{r3,r4} 4 Ø. Globally speaking, this means communication between r3 
and r4 must be preceded by communication between rı and rə. 


These conditions must hold coinductively for all local types that G |r can reduce 
to. Essentially, these conditions state that by performing idling actions, a local 
type can neither decrease its possible behaviour (C), nor increase it (EC-1), 
unless it is guaranteed the added behaviour cannot be exercised yet, because it 
is causally related to other communications that need to happen first (EC-2). 


Example 4 (Bad protocols, continued). Global type G; (Exmp. 3) is ill-formed: 
its projections onto b and c violate condition C. Global type G2 (Exmp. 3) is 
also ill-formed: its projections onto c and d violate condition EC. 


264 S. Jongmans and N. Yoshida 


[Al = z AS and A) = A1 and As => A2] or 
[4] = Ay and Ai, = Ai and a; € A;] or 
[Al © zx AY and A} => AY and az € A-z] or 
[Ai © zx AS and a1,a2 € A; | 


for some 47, AZ Ct (A) C(A’) 
for all [A => 4; and A => Aj] forall a,7 forall A -> A’ 
Caz (A) C(A) 


[A” x A** and Ay A* SS = Am or 
[A” x A* and A > A* and a, € A, or 


Chain A 
for some A*, A** ECZ(A) EC(A’) 
forall A & A’ 5 A” foralla¢A,,7r forall A> A’ 
EC% (4) EC(A) 
[Li = Ly and lı = l2] [r(T) N r(l1) #0 and [Chain L’ or | ¢ A; |] 
for all [L > Li and L 4> L5] forall L-5 L' 5 L” 
Chain L 


Fig. 9: Well-formedness conditions; A, A’, A”, Ai, AY, 45, AS E Locu (R — Loc) 


Fig. 9 defines C and EC formally. We define C not only for local types, but also 
for groups of local types, as this simplifies some notation later on. We prove key 
properties of C: Thm. 1 states commutativity of local sends /receives/idling (l) in 
local types gets lifted to commutativity of global communications/idling (a) in 
groups of local types; Lem. 4 states weak bisimilarity preserves commutativity. 


Theorem 1. eee for all r € dom q implies | oe) | 


or all l, T for all a,7 


and [[C(L(r)) for all r € dom £] implies C(L)] 


Proof. The first conjunct is proven by induction on the rules of =. The second 
is proven by coinduction on the rule of C, also using the first conjunct. 


Lemma 4. ||C%: (£1) and Lı ~ L2] implies C: (L2)] 
and [[C(£1) and £1 ~ L2] implies C(L2)| 


Proof. The first conjunct is proven by applying the definitions of C and %; the 
second is proven by coinduction on the rule of C, also using the first conjunct. 


We also prove key properties of Chain and EC, both of which work specifically 
for groups of projections: Lem. 5 states if the projections of rı and r2 are both 
causal chains, they cannot weakly reduce to local types where they can perform 
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reciprocal actions (rı the send; ro the receive); Thm. 2 states eagerness of lo- 
cal sends/receives (not idling) in projections gets lifted to eagerness of global 
communications in groups of projections (cf. Thm. 1). 


Chain (G [I R)(r1) = L’(r1) a L” (rı) and 


L 5. a rire? implies fal 
emma eee (Git R)(r2) = L'(r2) wie, L” (r2) | implies false 


Proof. By induction on the rules of =>. oO 


ECA (G Ii 2) 


mplies 
d forall a,r 


ETET | EC? ((G I R)(r)) 
foralli ¢ A,,7,r € R 


and ||EC(£(r)) forall r € dom £] implies EC(L)] 


Proof. The first conjunct is proven by using Lem. 5; the second is proven by 
coinduction on the rule of EC, also using the first conjunct. 


We note that, in contrast to Lem. 4 for C, we do not have a lemma that states 
weak bisimilarity preserves EC. Such a lemma would have been highly useful in 
our subsequent proofs, but it is unfortunately false, because weak bisimilarity 
does not preserve Chain. A simple counterexample, for local types, is this: Dy = 
rır2!U and Lo = €73,, -rirg!U, where {r1, r2} N {r3; r4, rs} = Ø. While Lı and 
Lə are weakly bisimilar, Lı is the start of a unary causal chain, but Lə is not. 
The problem here is that Chain depends on the role names associated with idling 
actions, whereas weak bisimilarity abstracts those role names away. 

We call a global type well-formed if each of its projections satisfies C and EC. 


3.6 Correctness of Projection under Well-Formedness 


We now to prove our main result: if a global type is well-formed, it is weakly 
bisimilar to the group of its projections. We start by defining a relation p< to 
relate global types with groups of local types (denoted by R in Fig. 8): 


* 


C(G R) EC(G Ñ R) (GIRAS L'LL C(L) 
Gul 


. * . P 
Here, we write Lı = £2 as an abbreviation for: 


[Li ~ Li => Li» Lz forsome L4, Lh] or Lı % Lo 
In words, Lı => L> means L; has a silent reduction (only 7-s) to a term that 
is weakly bisimilar to £2, or £; is already weakly bisimilar to £2 (without any 
reductions). Essentially, if C(G |f R) and EC(G |} R), then x relates G to a set 
of groups S = {£L | Goa £} that can roughly be characterised as follows: 


— (base) G || Ris in S; 
— (successors) any group to which G |! R can silently reduce, is in S; 
— (predecessors) any group that can silently reduce to G |} R, is in S; 


266 S. Jongmans and N. Yoshida 


— (pseudo-predecessors) any group that can silently reduce to a group to which 
G |} R can silently reduce, is in S; 
— (closure) S is closed under weak bisimilarity. 


The following technical lemma states if a well-formed group of projections 
G |} R can weakly g-reduce to some group £’, then the original global type G 
can g-reduce to some G’, and L’ and the group of projections of G” either are 
weakly bisimilar, or they can weakly reduce to a weakly bisimilar group £”. 


Lemma 6. [[C(G || R) and EC(G jj R) and (G | R) & £']] 
implies lle 2, @' and (G' | R) 5 L" & z| for some x 


Proof. By induction on the rules of =, also using Lem. 3. 


The following two lemmas state key properties of ><: Lem. 7 states > preserves 
termination (as weak termination); Lem. 8 states > coinductively preserves re- 
duction (as weak reduction). Together, these lemmas imply = C < and p!C =, 
which in turn imply = C x. 


Lemma 7. [G x £ and GI] implies LJ] 
and [G xL and L4] implies GY] 
Proof. The first conjunct is proven by induction on the rules of =, also using 


Lem. 1; the second is proven by contradiction (assume not G |; derive false; 
conclude GJ; it implies G |). 


[G’ L and £L S L'] 
Lemma 8. |[G >< £ and G -> G'] implies i 
for some £ 
[G’ >% L" and G > G"] 
and |[G £ and £ > £’] implies 


for some G” 


and |[G > £ and £ 5 L'] implies G ra £'] 


Proof. The first and second conjunct are proven by induction on the rules of =, 
also using Lemmas 3—4; the third is proven by induction on the rules of =>. 


Theorem 3. [C (G || R) and EC(G Ñ R)] implies G ~ (G |f R) 


Proof. By coinduction on the rule of < (Fig. 8), also using Lemmas 7-8. 


A group of local types £ enjoys deadlock-freedom if it either has successfully 
terminated (£J; Fig. 5a) or can make another reduction. A group of local types 
L enjoys absence of protocol violations relative to global type G if, coinductively, 
every non-7 reduction of £ can be simulated by G (i.e., every communication 
in the group is “permitted” by G). The following corollary relates Thm. 3 of 
operational equivalence to these classical MPST properties: 
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Corollary 1. If global type G is well-formed, then the group of G’s projections 
enjoys deadlock-freedom and absence of protocol violations relative to G. 


The key insight to understand this, is that global types are by definition free 
of deadlocks (they either reduce to 1, or they never terminate; Fig. 3), while 
weak bisimilarity preserves deadlock-freedom of global types in their projections 
(notably, weak bisimilarity is sensitive to termination, and a group of local types 
terminates only if all individual local types terminate; Fig. 5a). Weak bisimilarity 


also directly implies freedom of protocol violations. 


3.7 Decidability of Checking Well-Formedness 


We note our proof of Thm. 3 is non-constructive, in the sense that & is infinitely 
large (i.e., for each group of local types, there exist infinitely many weakly bisim- 
ilar groups). The following proposition states this is not a problem in practice. 


Proposition 1. Checking C(L) and EC(L) is decidable. 


The rationale behind this proposition is as follows. First, to check C(£) and 
EC(L), by Thm. 1 and Thm. 2, it suffices to check C(L(r)) and EC(L(r)) for 
each r € dom £. For each such local type L(r), there are two possibilities. 

If local type £(r) has finite control, its state space can be exhaustively ex- 
plored in finite time, so checking C(£(r)) and EC(L(r)) is obviously decidable. 

In contrast, if £(r) has non-finite control, we make two observations. The 
first observation is that the only possibly source of infinity is the occurrence of 
recursion variables under parallel composition. The second observation is that 
C and EC are true for Lı || Lə if they are true for Lı and Lə separately; this is 
because C and EC essentially assert a “diamond structure” on the reductions of 
L,||L2, which is precisely the operational semantics of || (Fig. 3). Thus, we can 
check C(L,||L2) and EC(L4||Z2) by checking C(Z1), C(L2), EC(Z1), and EC(L2) 
thereby “avoiding” the possible source of infinity. 

We note that splitting the checks for parallel composition in this way not only 
ensures decidability; it also avoids exponential state explosion (in the number of 
nested ||-operators in a single local type) in local types with finite control. 


? 


3.8 Discussion of Challenges 


Our use of (weak) bisimilarity, plus the key insight to annotate silent actions with 
additional information to keep track of choices, made the problem of proving the 
correctness of projection (Thm. 3) feasible. The major technical challenges to 
achieve this were defining the right bisimulation relation (Sect. 3.5) and discov- 
ering corresponding well-formedness conditions (Sect. 3.6). 

A naive weak bisimulation relation, Rnaive, relates every global type only 
with its group of projections. Rnaive is sufficient to prove that every reduction 
of a global type can be weakly simulated with one non-silent reduction of the 
group (sender and receiver), followed by a number of silent reductions (idling 
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Local types 


(if well-formed) 
Global type Local types 


= Parse .glob Project onto} ia well- Generate 
or .scr file all roles oo APIs in Java 


Fig. 10: Overview of mpstpp 


processes). In contrast, Rnaive is insufficient to prove that every reduction of the 
group can be simulated by its global type, because of silent actions: if global type 
G is related to group of projections £ by Rypaive, and a silent action subsequently 
reduces £ to £’, the simulation fails, as Rnaive does not relate G to £’. 

To alleviate this issue, we defined the bisimulation relation in such a way 
that it relates every global type G to a group of local types that are not nec- 
essarily equal to the projections of G, but every local type can be behind the 
corresponding projection (the local type can reach the projection with silent 
actions) or ahead (the projection can reach the local type with silent actions). 


4 Practical Experience with the Theory 


4.1 Implementation 


Tool. We implemented a tool, mpstpp, based on the core theoretical contribu- 
tions of this paper. Fig. 10 shows a high-level overview of the tool, including the 
main components (boxes) and data flows (arrows). 

First, mpstpp parses an input .glob-file to a data structure for a global type 
G (programmer-friendly Scribble-style syntax [35] is also supported as input). 
Then, it projects G onto all roles that occur in G. Then, it checks each of the re- 
sulting local types for well-formedness, depending on settings, either sequentially 
or in parallel: a key advantage of the formulation of our well-formedness condi- 
tions is that they can be checked modularly for every role in isolation, enabling 
us to take advantage of modern multicore hardware. Finally, if the local types 
are well-formed, idling actions are eliminated and typed communication APIs are 
generated from the local types to enable MPST++-based programming in Java. 


Optimisations. Parsing, computing projections, and generating APIs is rela- 
tively inexpensive; instead, the run times of our tool are dominated by checks for 
well-formedness. We therefore implemented several optimisations to make these 
checks more efficient. Before we present these optimisations, we first note that 
the complexity of checking well-formedness of a local type L is polynomial in 
the number of successors that can be reached from L (Fig. 9). 

(1) Our first optimisation targets local types with parallel composition; local 
type Lı || L2 is potentially a serious bottleneck, as its number of successors is 
exponential in the number of nested ||-operators. Therefore, even with finite state 
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spaces, we check the well-formedness of L || La by checking the well-formedness 
of Lı and Lz, without explicitly considering the exponentially many successors 
of Lı || L2, exploiting the same observation as with decidability (Sect. 3.7). 

(2) Our second optimisation concerns computation of weak reductions. In 
particular, to check whether C and EC are true for a local type L, according to 
their definitions (Fig. 9), we need to iterate over each of their weak reductions. 
Especially if L has many 7r-reductions (Fig. 7), computing the set of weak reduc- 
tions can be expensive. To avoid this, mpstpp computes sound (but incomplete) 
approximations of C and EC. We implemented two kinds of approximations: (a) 
checking versions of C and EC where every occurrence of = in the definition is 
replaced with +, and (b) checking L ~ L’ for every r-reduction from L to L. 
Approximation (a) is sound for both C and EC (rationale: if individual reductions 
can commute, sequences of reductions consisting of those individual reductions 
can commute as well), but approximation (b) is sound only for C (rationale: 
auxiliary relation Chain of EC is not preserved by weak bisimilarity). To ensure 
soundness, thus, mpstpp never uses approximation (b) for EC. 

(3) Our third optimisation targets the checks for weak bisimilarity that occur 
in several places in the definitions of C and EC (Fig. 9). Instead of computing the 
full reduction relations and run an algorithm to decide their weak bisimilarity 
(which would be computationally costly), we take advantage of the fact that our 
language of local types is based on existing algebras (Sect. 3.1) that have sound 
and complete axiomatisations. Specifically, to check whether two local types are 
weakly bisimilar, mpstpp applies the axioms as rewrite rules and compares the 
resulting normal forms for structural equality. To ensure rewriting is fast, we 
sacrificed completeness (i.e., we use rewriting only to eliminate as many silent 
actions as possible in a sound way, but for instance, our rewrite procedure cannot 
prove that (Lı - T) + L2 and Lə + Lı are weakly bisimilar); however, for the ample 
examples we tried (including this paper’s), this optimisation is highly effective. 

Optimisations (2) and (3) are conservative: mpstpp may conclude C or EC is 
false, even though it is actually true. While this affects completeness, soundness 
is guaranteed: if mpstpp concludes a local type is well-formed, it really is. 


4.2 Evaluation of the Approach 


Setup. In the previous section, we formulated and proved the theoretical cor- 
rectness of our well-formedness conditions (Thm. 3). In this section, we demon- 
strate the practical usefulness through experimental evaluation in benchmarks. 
Specifically, we show that checking our well-formedness conditions is faster and 
more scalable than explicitly checking operational equivalence (which currently 
seems the only alternative to attain the same level of expressiveness as our work). 

In our benchmarks, we compare three approaches to check operational equiv- 
alence between a global type and its group of projected local types: 


— mpstpp-SEQ (baseline): In this approach, the mpstpp tool is used to check our 
well-formedness conditions (which imply operational equivalence; Thm. 3) 
without using any form of parallel processing. 


? 
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— mpstpp-PAR: Like mpstpp-SEQ, except each projected local type is checked 
in a separate thread. The fact our well-formedness conditions can be easily 
parallelised in this way is an important practical advantage. 

— EXPLICIT: In this approach, mpstpp is used only for parsing and projecting; 
after that, we use the state-of-the-art verification tool set mCRL2 [10,20,29] 
to explicitly check operational equivalence (details below). 


We identified six example protocols (details below) that can naturally be 
scaled in the number of roles N (e.g., the number of Clients in the Key-Value 
Store protocol). Using each of the three approaches, for each of the protocols, for 
each value of N between the minimal number of roles Nmin (e.g, Nmin=2 in the 
Key-Value Store protocol: the Server and one Client) and 16, we subsequently 
checked operational equivalence; varying N in this way, yields insights not only in 
per-case performance, but also scalability. To get statistically reliable results [31], 
we repeated executions as many times as was necessary until the 95% confidence 
interval was within 5% of our reported means (i.e., there is a 95% probability 
that the true mean is within 5% of our reported means). 

We ran our benchmarks on a machine with an Intel Xeon 6130 processor (16 
cores; no hyper-threading), using Debian 9, Java 13, and mCRL2 201908.0. 


Translation to mCRL2. In the EXPLICIT approach, we use mCRL2 [10,20,29] 
to explicitly check if global type G and its group of projections £ are opera- 
tionally equivalent. Our choice for mCRL2 is motivated by the fact our languages 
of global and local types are based on the same process algebra as mCRL2’s spec- 
ification language, so their translation to mCRL2 specifications is direct and 
straightforward. Moreover, mCRL2 is mature (e.g., used in industry [5]), and 
it uses optimised, state-of-the-art algorithms to check behavioural equivalences 
(e.g., [28]), so we are comparing our tool with a serious competitor. 

First, we translate global type G to mCRL2 specification [G]. Then, we use 
mCRL2 tools mcr1221ps and 1ps21ts to normalize |G] to a linear process spec- 
ification (LPS) and generate a corresponding labelled transition system (LTS). 
Because of the directness of the translation, the transition labels in the resulting 
LTS are all global communication actions of the form rı > 1r2:U. 

Second, we translate group of projections £, consisting of roles r1, ..., rn, to 
mCRL2 specification [£]. It looks as follows (in formal mCRL2 notation [29]): 


V{rir;:U|1<ij<n,iZj, UEU} ( 
lirir;Uurir;?U)=>(ri—>r;:U)l1<ij<n, eee eur IL(ri)] I- IEC) 


where each [L(r;)] is a direct translation of local type L(r;) to an mCRL2 
specification; || is a form of parallel composition that prescribes both interleaving 
and synchronisation of operand actions; LI is synchronous composition of actions; 
T is the communication operator that replaces synchronised local send/receive 
actions ryrj!U U r;r;?U with global communication action r;—r;:U; and V is 
the allow operator that allows only global communication actions to be executed 
(i.e., unsynchronized, individual send/receive actions cannot be executed). 
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When translating a local type £(r;) to an mCRL2 specification [L(r;)], to 
make mCRL2’s subsequent verification easier, we already eliminate as many 
idling actions ¢7.,., as possible (modulo branching bisimulation); those that re- 
main are represented as a general 7 action, because mCRL2 does not need the 
additional information provided by ¢7,,,... Then, we use mcr1221ps and lps21ts 
to generate an LPS and LTS for [|£]. 

Third, we use mCRL2 tool 1tscompare to check if the LTS for [G] is weakly 
bisimilar to the LTS for |£]. We note that normalisation to an LPS using 


mcrl1221ps is a requirement to use 1tscompare. 


Protocols. We used the following protocols in our benchmarks: 


Key- Value Store (KVS): This protocol is the same protocol as the one pre- 
sented in Sect. 2, except each inner parallel composition (||) is replaced with 
sequential composition (-). This is because mcr1221ps does not support nor- 
malisation of mCRL2 specifications where || occurs under recursion. 

Load Balancer (LB): This protocol consists of a Master and a number of 
Workers. Iteratively, first, a Request-message is communicated from the Mas- 
ter to one of the Workers; then, a Response-message is communicated from 
that Worker to the Master. 

Work Stealing (WS): This protocol consists of a Master and a number of 
Workers. Iteratively, a Job-message is communicated from the Master to one 
of the Workers. Meanwhile, Workers can try to “steal” jobs from each other: 
at any point, first, a Steal-message can be communicated from one Worker 
to another Worker; then, either a Job-message (if the former Worker has a 
job to spare) or a None-message (otherwise) is communicated from the latter 
Worker to the former Worker. 

Map/Reduce (MR): This protocol consists of a Master and a number of Work- 
ers. First, in no particular order, a Map-message is communicated from the 
Master to each Worker; then, in no particular order, a Reduce-message is 
communicated from each Worker to the Master. 

Peer-to-Peer (PtP): This protocol consists of a number of Peers. Unordered, 
a Msg-message is communicated from each Peer to each other Peer. 

Pub/Sub (PS): This protocol consists of a Publisher and a number of Sub- 
scribers. In no particular order, a Sub-message can be communicated once 
from each Subscriber to the Publisher to gain a subscription. Concurrently, 
a Pub-message can be communicated from the Publisher to each Subscriber 
with a subscription. 


The table on the right summarises the KVS LB WS MR PtP PS 
features used in each of these protocols. 


For each 1<n<15, we instantiated the + v v v 
Key-Value Store, Load Balancer, Work = v v v v 
|| v v v V 


Stealing, and Map/Reduce protocols with 
1 Server/Master + n Clients/Workers. 
For each 2<n<16, we instantiated the Peer-to-Peer protocol with n Peers. For 
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Fig. 11: Speedups (y-axis; y>1E+0 means faster, y<1E+0 means slower) of EX- 
PLICIT relative to mpstpp-SEQ as the number of roles increases (x-axis) 


each 2<n<7, we instantiated the Pub/Sub protocol with 1 Publisher and n Sub- 
scribers; we did not instantiate the Pub/Sub protocol with n>7 Subscribers, as 
the resulting global types are too large (their size grows exponentially in n). 


Benchmark results. Figures 11-12 shows the results of our benchmarks. The 
x-axis indicates the number of roles; the y-axis indicates relative speed-ups. The 
baselines are at y=1E+0 and y=1: above it, a competing approach is faster than 
mpstpp-SEQ; below it, it is slower. We draw two conclusions. 

(1) For each protocol and number of roles, mpstpp-SEQ outperforms 
EXPLICIT. In the cases of Key-Value Store and Load Balancer, EXPLICIT grows 
towards mpstpp-SEQ, but the growth levels off as the number of roles increases, 
while EXPLICIT is still about two order of magnitude slower than mpstpp-SEQ 
in the best of circumstances. In the cases of Work Stealing, Peer-to-Peer, and 
Pub/Sub, the LTSs generated from the translated mCRL2 specifications were 
too large to be compared (i.e., 1tscompare produced an error) beyond 7, 5, and 
5 roles; this was no issue for mpstpp-SEQ. In the case of Map/Reduce, the LTSs 
were small enough to compare using mCRL2’s 1tscompare, but after an initial 
upwards slope for 2< N<7 roles, EXPLICIT starts to perform progressively worse. 

(2) Especially for larger numbers of roles, parallelisation can yield 
serious performance improvements. In the cases of Key-Value Store and 
Load Balancer, mpstpp-PAR outperforms mpstpp-SEQ only with 14-16 roles; for 
smaller numbers of roles, parallel execution is slower. In the worst case (Load 


Balancer, 2 roles), the slowdown is roughly on = 3.4; we hypothesise that be- 
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Fig. 12: Speedups (y-axis; y>1 means faster, y<1 means slower) of mpstpp-PAR 
relative to mpstpp-SEQ as the number of roles increases (x-axis) 


cause of the low absolute execution times, the cost of spawning and synchronising 
threads outweighs their benefit. However, the ascending gradient indicates that 
as the number of roles increases, relatively more of the total work can be paral- 
lelised, yielding progressive rewards. In the cases of Work Stealing, Map/Reduce, 
Peer-to-Peer, and Pub/Sub, similar trends can be observed, except y=1 is crossed 
sonner. The absolute execution times for these protocols and for small numbers 
of roles are higher than for Key-Value Store and Load Balancer. 


5 Related Work 


Multiparty compatibility. Closest to this paper is existing literature on mul- 
tiparty compatibility [6,24,40,42]. The key idea, initially developed by Deniélou 
and Yoshida for the original MPST [23,24], is to represent (groups of) local types 
operationally as (systems of) communicating finite state machines (CFSM) [8]. A 
CFSM M is a state machine where transitions are labelled with sends/receives; 
a system of CFSMs S is a parallel composition where CFSMs communicate 
through asynchronous buffers. Multiparty compatibility, then, is a condition on 
the reachable states and transitions of a system S = (Mı, ..., Mn): if it is sat- 
isfied by S, the system is guaranteed to be safe (no deadlocks; no unmatched 
sends/receives) and live (S terminates, assuming at least one M; can termi- 
nate). Multiparty compatibility is a sufficient condition to guarantee safety and 
liveness, but not necessary: there exist safe/live systems that are not multiparty 
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compatible. Therefore, several generalisations have been proposed to cover timed 
behaviour [6], undirected choice [40], and non-synchronisability [42]. 

The main similarities between our method in this paper and the multiparty 
compatibility approach are: (1) we also use an operational interpretation of local 
types; (2) we guarantee similar liveness/safety properties; (3) and we also neatly 
factor out the act of checking conformance of processes to local types (resp. CF- 
SMs). In contrast, we support a wider range of behaviours. Moreover, from a 
practical /computational perspective, multiparty compatibility is a global condi- 
tion that needs to be checked on the whole state space of a system (i.e., parallel 
composition of the CFSMs), prone to exponential blow-up; our well-formedness 
conditions, in contrast, are completely local and require only polynomial time to 
check. The reason we do not require CFSM-like machinery in this paper is that 
our operational correspondence (weak bisimilarity) is sensitive to termination: 
notably, in Fig. 5a, a group of local types terminates iff every individual lo- 
cal type terminates (for multiparty compatibility, proofs are done modulo trace 
equivalence [24], which cannot distinguish between successful/abnormal termi- 
nation and is therefore in itself too weak to show deadlock-freedom). 


Expressiveness of MPST. In the original MPST theory [33], and many of 
its descendants (e.g., [14,19,22,24,25,43]), the restrictions on choices are en- 
forced through a combination of syntax and additional well-formedness con- 
ditions. Notably, in these works, communications in global types are specified 
as ry r2: {4i Gi jier, so syntactically, it is impossible to specify choices among 
senders or receivers. There exist also papers where a seemingly more general 
binary +-like operator is introduced, particularly those that support choices 
among receivers [16,23,36,40], but the well-formedness conditions still basically 
restrict the use of + in these works to rı >r2:{€;-Gi}ier or r—>{ri: li Gibier. 

This is the first paper where well-formedness conditions do not force the use 
of + into one of those two restricted forms. Moreover, our well-formedness con- 
ditions are compatible with unbounded interleaving (recursion under parallel), 
beyond similar operators in previous work [16,22,23,43]. An alternative approach 
is to completely omit statically checked well-formedness conditions (and projec- 
tion), and to only dynamically verify communication actions against global types 
through monitoring, as recently proposed [30]. The language of global types in 
that paper is more expressive than ours in this paper, but all verification happens 
at run-time, whereas we provide correctness guarantees already at compile-time. 


Session types and model checking. Recently, there has been growing interest 
in using model checking to verify properties of (multiparty) session types, similar 
to our use of mCRL2 as an alternative to checking well-formedness (Sect. 4.2). 
Lange et al. [39] infer behavioural types from Go programs and use mCRL2 to 
verify the inferred types, to establish safety properties (combined with another 
tool, KITTeL [26], to establish liveness). Hu and Yoshida [36] use a custom model 
checker to verify safety and progress properties of local types (represented as 
CFSMs) as part of API generation in the Scribble toolchain for MPST [35]. 
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Closest to our use of mCRL2 is the work of Scalas et al. [52,53], where mCRL2 
is used to verify properties of local types (e.g., deadlock-freedom), while a form of 
dependent type-checking is used to verify conformance of processes against those 
types (i.e., actors in Scala); no global types and projection are used, though (pro- 
grammers write local types manually). The idea is that properties model-checked 
on the types carry over to the processes. Similarly, Scalas and Yoshida [51] use 
mCRL2 to model-check session environments, as a more expressive alternative 
to the classical consistency condition needed to prove subject reduction. Note 
that [51, Theorem 5.15] shows that, in the case that a set of processes is typable 
by a single multiparty session (i.e. a single global type), type-level properties 
including safety, deadlock-freedom and liveness guarantee the same properties 
for multiparty session 7-processes. Hence our type-level analysis is directly us- 
able to provide decidable procedures to verify session m-calculi with extended 
expressiveness [51, Theorem 7.2]. 


6 Conclusion 


A key open problems with multiparty session types (MPST) concerns expressive- 
ness: none of the previous languages of global and local types supports arbitrary 
choice (e.g., choices between different senders), existential quantification over 
roles, and unbounded interleaving of subprotocols (in the same session). In this 
paper, we presented the first theory that supports these features. Our main the- 
oretical result is operational equivalence under weak bisimilarity: this guarantees 
classical MPST properties for groups of local types projected from a global type, 
namely freedom of deadlocks and absence of protocol violations. Our main prac- 
tical result is that our well-formedness conditions, which guarantee operational 
equivalence, can be checked orders of magnitude faster than directly checking 
weak bisimilarity, which is demonstrated by our benchmark results. 


We identify several interesting avenues for future work. First, it is useful to 
extend our theory with parametrisation along the lines of Castro et al. [18] (which 
currently works only for restrictive choices); their proof technique for correctness 
seems to offer substantial synergy with our bisimilarity-based approach in this 
paper. Second, we aim to investigate extensions of our theory with subtyping 
(e.g., in terms of weak similarity). Notably, while asynchronous communication 
can be encoded in our current theory, asynchronous subtyping is known to be 
undecidable [9,41], so the connection between the two is interesting to explore. 
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Abstract. Multithreaded programs generally leverage efficient and thread-safe 
concurrent objects like sets, key-value maps, and queues. While some concurrent- 
object operations are designed to behave atomically, each witnessing the atomic 
effects of predecessors in a linearization order, others forego such strong consis- 
tency to avoid complex control and synchronization bottlenecks. For example, 
contains (value) methods of key-value maps may iterate through key-value 
entries without blocking concurrent updates, to avoid unwanted performance 
bottlenecks, and consequently overlook the effects of some linearization-order 
predecessors. While such weakly-consistent operations may not be atomic, they 
still offer guarantees, e.g., only observing values that have been present. 

In this work we develop a methodology for proving that concurrent object 
implementations adhere to weak-consistency specifications. In particular, we 
consider (forward) simulation-based proofs of implementations against relaxed- 
visibility specifications, which allow designated operations to overlook some of 
their linearization-order predecessors, i.e., behaving as if they never occurred. Be- 
sides annotating implementation code to identify linearization points, i.e., points 
at which operations’ logical effects occur, we also annotate code to identify visible 
operations, i.e., operations whose effects are observed; in practice this annotation 
can be done automatically by tracking the writers to each accessed memory 
location. We formalize our methodology over a general notion of transition 
systems, agnostic to any particular programming language or memory model, 
and demonstrate its application, using automated theorem provers, by verifying 
models of Java concurrent object implementations. 


1 Introduction 


Programming efficient multithreaded programs generally involves carefully organiz- 
ing shared memory accesses to facilitate inter-thread communication while avoiding 
synchronization bottlenecks. Modern software platforms like Java include reusable 
abstractions which encapsulate low-level shared memory accesses and synchronization 
into familiar high-level abstract data types (ADTs). These so-called concurrent objects 
typically include mutual-exclusion primitives like locks, numeric data types like atomic 
integers, as well as collections like sets, key-value maps, and queues; Java’s standard- 
edition platform contains many implementations of each. Such objects typically provide 
strong consistency guarantees like linearizability [I8], ensuring that each operation 
appears to happen atomically, witnessing the atomic effects of predecessors according 
to some linearization order among concurrently-executing operations. 


© The Author(s) 2020 
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While such strong consistency guarantees are ideal for logical reasoning about 
programs which use concurrent objects, these guarantees are too strong for many oper- 
ations, since they preclude simple and/or efficient implementation — over half of Java’s 
concurrent collection methods forego atomicity for weak-consistency [13]. On the one 
hand, basic operations like the get and put methods of key-value maps typically admit 
relatively-simple atomic implementations, since their behaviors essentially depend 
upon individual memory cells, e.g., where the relevant key-value mapping is stored. 
On the other hand, making aggregate operations like size and contains (value) atomic 
would impose synchronization bottlenecks, or otherwise-complex control structures, 
since their atomic behavior depends simultaneously upon the values stored across 
many memory cells. Interestingly, such implementations are not linearizable even 
when their underlying memory operations are sequentially consistent, e.g., as is the 
case with Java 8’s concurrent collections, whose memory accesses are data-race freef] 

For instance, the contains (value) method of Java’s concurrent hash map iterates 
through key-value entries without blocking concurrent updates in order to avoid 
unreasonable performance bottlenecks. Consequently, in a given execution, a contains- 
value-v operation 0, will overlook operation 02’s concurrent insertion of kı +> v fora 
key kı it has already traversed. This oversight makes it possible for 0; to conclude that 
value v is not present, and can only be explained by o; being linearized before o2. In the 
case that operation o3 removes k2 ++ v concurrently before 0; reaches key k2, but only 
after 02 completes, then atomicity is violated since in every possible linearization, either 
mapping k2 +> v or kı + v is always present. Nevertheless, such weakly-consistent 
operations still offer guarantees, e.g., that values never present are never observed, and 
initially-present values not removed are observed. 

In this work we develop a methodology for proving that concurrent-object imple- 
mentations adhere to the guarantees prescribed by their weak-consistency specifica- 
tions. The key salient aspects of our approach are the lifting of existing sequential ADT 
specifications via visibility relaxation [13], and the harnessing of simple and mechaniz- 
able reasoning based on forward simulation by relaxed-visibility ADTs. Effectively, 
our methodology extends the predominant forward-simulation based linearizability- 
proof methodology to concurrent objects with weakly-consistent operations, and 
enables automation for proving weak-consistency guarantees. 

To enable the harnessing of existing sequential ADT specifications, we adopt the 
recent methodology of visibility relaxation [I3]. As in linearizability [18], the return 
value of each operation is dictated by the atomic effects of its predecessors in some 
(i.e., existentially quantified) linearization order. To allow consistency weakening, 
operations are allowed, to a certain extent, to overlook some of their linearization-order 
predecessors, behaving as if they had not occurred. Intuitively, this (also existentially 
quantified) visibility captures the inability or unwillingness to atomically observe 
the values stored across many memory cells. To provide guarantees, the extent of 


* Java 8 implementations guarantee data-race freedom by accessing individual shared-memory 
cells with atomic operations via volatile variables and compare-and-swap instructions. Starting 
with Java 9, the implementations of the concurrent collections use the VarHandle mechanism 
to specify shared variable access modes. Java’s official language and API specifications do not 
clarify whether these relaxations introduce data races. 
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visibility relaxation is bounded to varying degrees. Notably, the visibility of an absolute 
operation must include all of its linearization-order predecessors, while the visibility 
of a monotonic operation must include all happens-before predecessors, along with 
all operations visible to them. The majority of Java’s concurrent collection methods 
are absolute or monotonic [13]. For instance, in the contains-value example described 
above, by considering that operation 02 is not visible to 01, the conclusion that v is not 
present can be justified by the linearization 02; 03; 01, in which o; sees 03's removal 
of k2 ++ v yet not 09’s insertion of kı > v. Ascribing the monotonic visibility to 
the contains-value method amounts to a guarantee that initially-present values are 
observed unless removed (i.e., concurrently). 


While relaxed-visibility specifications provide a means to describing the guar- 
antees provided by weakly-consistent concurrent-object operations, systematically 
establishing implementations’ adherence requires a strategy for demonstrating simula- 
tion [25], i.e., that each step of the implementation is simulated by some step of (an 
operational representation of) the specification. The crux of our contribution is thus 
threefold: first, to identify the relevant specification-level actions with which to relate 
implementation-level transitions; second, to identify implementation-level annotations 
relating transitions to specification-level actions; and third, to develop strategies for 
devising such annotations systematically. For instance, the existing methodology based 
on linearization points essentially amounts to annotating implementation-level 
transitions with the points at which its specification-level action, i.e., its atomic effect, 
occurs. Relaxed-visibility specifications require not only a witness for the existentially- 
quantified linearization order, but also an existentially-quantified visibility relation, 
and thus requires a second kind of annotation to resolve operations’ visibilities. We 
propose a notion of visibility actions which enable operations to declare their visibility 
of others, e.g., specifying the writers of memory cells it has read. 


The remainder of our approach amounts to devising a systematic means for con- 
structing simulation proofs to enable automated verification. Essentially, we identify a 
strategy for systematically annotating implementations with visibility actions, given 
linearization-point annotations and visibility bounds (i.e., absolute or monotonic), and 
then encode the corresponding simulation check using an off-the-shelf verification 
tool. For the latter, we leverage cıvı [16], a language and verifier for Owicki-Gries style 
modular proofs of concurrent programs with arbitrarily-many threads. In principle, 
since our approach reduces simulation to safety verification, any safety verifier could 
be used, though cıvzı facilitates reasoning for multithreaded programs by capturing 
interference at arbitrary program points. Using cIvL, we have verified monotonicity of 
the contains-value and size methods of Java’s concurrent hash-map and concurrent 
linked-queue, respectively — and absolute consistency of add and remove operations. 
Although our models are written in crvL and assume sequentially-consistent memory 
accesses, they capture the difficult aspects of weak-consistency in Java, including heap- 
based memory access; furthermore, our models are also sound with respect to Java 8’s 
memory model, since their Java 8 implementations guarantee data-race freedom. 


In summary, we present the first methodology for verifying weakly-consistent op- 
erations using sequential specifications and forward simulation. Contributions include: 
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- the formalization of our methodology over a general notion of transition systems, 
agnostic to any particular programming language or memory model (§3); 

— the application of our methodology to verifying a weakly-consistent contains-value 
method of a key-value map (§4); and 

- amechanization of our methodology used for verifying models of weakly-consistent 
Java methods using automated theorem provers (§5). 


Aside from the outline above, this article summarizes an existing weak-consistency 
specification methodology via visibility relaxation (§2), summarizes related work (§6), 
and concludes (§7). Proofs of all theorems and lemmas are listed in Appendix[A] 


2 Weak Consistency 


Our methodology for verifying weakly-consistent concurrent objects relies both on the 
precise characterization of weak consistency specifications, as well as a proof technique 
for establishing adherence to specifications. In this section we recall and outline a 
characterization called visibility relaxation [13], an extension of sequential abstract 
data type (ADT) specifications in which the return values of some operations may not 
reflect the effects of previously-effectuated operations. 

Notationally, in the remainder of this article, € denotes the empty sequence, Ø 
denotes the empty set, _ denotes an unused binding, and T and L denote the Boolean 
values true and false, respectively. We write R(x) to denote the inclusion x € R of 
a tuple x in the relation R; and R[x > y] to denote the extension RU {xy} of R to 
include zy; and R | X to denote the projection RN X* of R to set X; and R to denote 
the complement {x : x ¢ R} of R; and R(x) to denote the image {y : xy € R} of Ron 
x; and R`! (y) to denote the pre-image {x : ry € R} of R on y; whether R(x) refers 
to inclusion or an image will be clear from its context. Finally, we write x; to refer to 
the ith element of tuple x = £ox1 .... 


2.1 Weak-Visibility Specifications 


For a general notion of ADT specifications, we consider fixed sets M and X of method 
names and argument or return values, respectively. An operation label A = (m, x, y) 
is a method name m € M along with argument and return values x,y € X. A read- 
only predicate is a unary relation R(A) on operation labels, an operation sequence 
s = Apr, ... is a sequence of operation labels, and a sequential specification S = 
{s0, 51,...} is a set of operation sequences. We say that R is compatible with S when S 
is closed under deletion of read-only operations, i.e., Ay... Aj—1Aj41---Ai E S when 
ào... A; E S and R(A;). 


Example 1. The key-value map ADT sequential specification Sm is the prefix-closed 
set containing all sequences Ao . . . A; such that A; is either: 


- (put, kv, b}, and b = T iff some (rem, k, _) follows any prior (put, kv, _); 

rem, k, b), and b = T iff no other (rem, k, _) follows some prior (put, kv,_); 
get, k, v), and no (put, ku’, _) nor (rem, k, _} follows some prior (put, kv, _}, and 
v = L if no such (put, kv, _) exists; or 


= 
=f 
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- (has, v,b), and b = T iff no prior (put, kv’, _) nor (rem, k, _) follows some prior 


(put, kv, _). 
The read-only predicate Rm holds for the following cases: 
Ryy((put,_,b)) if +b Ryy((tem,_,b)) if +b Ry((get,_»_)) Ram((has,_,_)). 
This is a simplification of Java’s Map ADT, ie., with fewer methods) 


To derive weak specifications from sequential ones, we consider a set V of ex- 
actly two visibility labels from prior work [13]: absolute and monotonic] A visibility 
annotation V : M —> V maps each method m € M to a visibility V (m) € V. 

Intuitively, absolute visibility requires operations to observe the effects of all of their 
linearization-order predecessors. The weaker monotonic visibility requires operations 
to observe the effects of all their happens-before (i.e., program- and synchronization- 
order) predecessors, along with the effects already observed by those predecessors, 
i.e., so that sets of visible effects are monotonically increasing over happens-before 
chains of operations; conversely, operations may ignore effects which have been ignored 
by their happens-before predecessors, so long as those effects are not transitively related 
by program and synchronization order. 


Definition 1. A weak-visibility specification W = (S, R, V} is a sequential specifica- 
tion S with a compatible read-only predicate R and a visibility annotation V. 


Example 2. The weakly-consistent contains-value map Wm = (Sm, Rm, Vm) annotates 
the key-value map ADT methods of Sm from Example fi] with: 


Vin(put) = Vn (rem) = V,,(get) = absolute, Vm (has) = monotonic. 
Java’s concurrent hash map appears to be consistent with this specification [13]. 


We ascribe semantics to specifications by characterizing the values returned by 
concurrent method invocations, given constraints on invocation order. In practice, the 
happens-before order among invocations is determined by a program order, i.e., among 
invocations of the same thread, and a synchronization order, i.e., among invocations 
of distinct threads accessing the same atomic objects, e.g., locks. A history h = 
(O, inv, ret, hb) is a set O C N of numeric operation identifiers, along with an invoca- 
tion function inv : O —> M x X mapping operation identifiers to method names and 
argument values, a partial return function ret : O — X mapping operation identifiers 
to return values, and a (strict) partial happens-before relation hb C O x O; the empty 
history hg has O = inv = ret = hb = f. An operation o € O is complete when ret(o) 
is defined, and is otherwise incomplete; then h is complete when each operation is. The 
label of a complete operation o with inv(o) = (m, x) and ret(o) = y is (m, x, y). 

To relate operations’ return values in a given history back to sequential specifica- 


tions, we consider certain sequencings of those operations. A linearization of a history 
h = (O,_,_, hb) is a total order lin D hb over O which includes hb, and a visibility 


` For brevity, we abbreviate Java’s remove and contains-value methods by rem and has. 
é Previous work refers to absolute visibility as complete, and includes additional visibility labels. 
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projection vis of lin maps each operation o € O to a subset vis(o) C lin~'(o) of the 
operations preceding o in lin; note that (01,02) € vis means 0; observes 02. For a 
given read-only predicate R, we say o’s visibility is monotonic when it includes every 
happens-before predecessor, and operation visible to a happens-before predecessor, 
which is not read-only/Ji.e., vis(o) D (hb~* (o) U vis(hb~*(0))) | R. We says o’s 
visibility is absolute when vis(o) = lin™+ (o0), and vis is itself absolute when each vis(o) 
is. An abstract execution e = (h, lin, vis) is a history h along with a linearization of 
h, and a visibility projection vis of lin. An abstract execution is sequential when hb is 
total, complete when A is, and absolute when vis is. 


Example 3. An abstract execution can be defined using the linearizatior] 
(put, (1,1), T) (get, 1,1) (put, (0,1), T) (put, (1,0), L) (has, 1, L) 


along with a happens-before order that, compared to the linearization order, keeps 
(has, 1, L) unordered w.r.t. (put, (0, 1), T) and (put, (1, 0), L}, and a visibility projec- 
tion where the visibility of every put and get includes all the linearization predecessors 
and the visibility of (has, 1, L} consists of (put, (1, 1), T} and (put, (1,0), L). Recall 
that in the argument (k, v) to put operations, the key k precedes value v. 


To determine the consistency of individual histories against weak-visibility spec- 
ifications, we consider adherence of their corresponding abstract executions. Let 
h = (O, inv, ret, hb) be a history and e = (h, lin, vis) a complete abstract execu- 
tion. Then e is consistent with a visibility annotation V and read-only predicate R if 
for each operation o € dom(lin) with inv(o) = (m,_), vis(o) is absolute or mono- 
tonic, respectively, according to V (m) and R. The labeling \pA1 ... of a total order 
09 < 01 x ... of complete operations is the sequence of operation labels, i.e., A; is the 
label of 0;. Then e is consistent with a sequential specification S when the labeling] 
of lin | (vis(o) U {o}) is included in S, for each operation o € dom(lin)[P]Finally, we 
say e is consistent with a weak-visibility specification (S, R, V} when it is consistent 


with S, R, and V. 


Example 4. The execution in Example Blis consistent with the weakly-consistent 
contains-value map Wn defined in Example[2] 


Remark 1. Consistency models suited for modern software platforms like Java are based 
on happens-before relations which abstract away from real-time execution order. Since 
happens-before, unlike real-time, is not necessarily an interval order, the composition 


7 For convenience we rephrase Emmi and Enea [I3] s notion to ignore read-only predecessors. 

ë For readability, we list linearization sequences with operation labels in place of identifiers. 

? As is standard, adequate labelings of incomplete executions are obtained by completing each 
linearized yet pending operation with some arbitrarily-chosen return value [18]. It is sufficient 
that one of these completions be included in the sequential specification. 

1 We consider a simplification from prior work [T3]: rather than allowing the observers of a 
given operation to pretend they see distinct return values, we suppose that all observers agree 
on return values. While this is more restrictive in principle, it is equivalent for the simple 
specifications studied in this article. 
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of linearizations of two distinct objects in the same execution may be cyclic, i.e., not 
linearizable. Recovering compositionality in this setting is orthogonal to our work of 
proving consistency against a given model, and is explored elsewhere [11]. 


The abstract executions E(W) of a weak-visibility specification W = (S, R, V} 
include those complete, sequential, and absolute abstract executions derived from 
sequences of S, i.e., when s = ào... Àn E S then each e, labels each o; by \;, and 
orders hb(o;,0;) iff i < j. In addition, when E(W) includes an abstract execution 
(h, lin, vis) with h = (O, inv, ret, hb), then E(W) also includes any: 


- execution (h’, lin, vis) such that h’ = (O, inv, ret, hb’) and hb’ C hb; and 
- W-consistent execution (h’, lin, vis’) with h’ = (O, inv, ret’, hb) and vis’ C vis. 


Note that while happens-before weakening hb’ C hb always yields consistent executions, 
unguarded visibility weakening vis’ C vis generally breaks consistency with visibility 
annotations and sequential specifications: visibilities can become non-monotonic, and 
return values can change when operations observe fewer operations’ effects. 


Lemma 1. The abstract executions E(W) of a specification W are consistent with W. 


Example 5. The abstract executions of Wm include the complete, sequential, and abso- 
lute abstract execution defined by the following happens-before order 


(put, (1,1), T) (get, 1,1) (put, (0,1), T) (put, (1,0), L) (has, 1, T) 


which implies that it also includes one in which just the happens-before order is modi- 
fied such that (has, 1, T) becomes unordered w.r.t. (put, (0, 1), T} and (put, (1,0), L). 
Since it includes the latter, it also includes the execution in Example [3] where the 
visibility of has is weakened which also modifies its return value from T to L. 


Definition 2. The histories of a weak-visibility specification W are the projections 
H(W) = {h: (h,_,_) € E(W)} ofits abstract executions. 


2.2 Consistency against Weak-Visibility Specifications 


To define the consistency of implementations against specifications, we leverage a 
general model of computation to capture the behavior of typical concurrent systems, 
e.g., including multiprocess and multithreaded systems. A sequence-labeled transition 
system (Q, A, q, —) is a set Q of states, along with a set A of actions, initial state q € Q 
and transition relation > € Q x A* x Q. An execution is an alternating sequence 
n = qodoq1 G1... dn of states and action sequences starting with go = q such that 
qi Æ qi+1ı for each 0 < i < n. The trace r € A* of the execution 7 is its projection 
God, ... to individual actions. 

To capture the histories admitted by a given implementation, we consider sequence- 
labeled transition systems (SLTSs) which expose actions corresponding to method call, 
return, and happens-before constraints. We refer to the actions call(o, m, x), ret(o, y), 
and hb(o, 0’), for 0, o’ € N, m € M, and x, y € X, as the history actions, and a history 
transition system is an SLTS whose actions include the history actions. We say that an 
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action over operation identifier o is an o-action, and assume that executions are well 
formed in the sense that for a given operation identifier o: at most one call o-action 
occurs, at most one ret o-action occurs, and no ret nor hb o-actions occur prior toa 
call o-action. Furthermore, we assume call o-actions are enabled, so long as no prior 
call o-action has occurred. The history of a trace 7 is defined inductively by fn(hg, T), 
where høg is the empty history, and, 


falhe) =R gn(h, call(o, m, £)) = (O U {0}, inv[o =œ (m, x)], ret, hb) 
falh, aT) = falga(h,a), T) gn(h,ret(o,y)) = (O, inv, ret[o > y], hb) 
falh, àT) = falh, T) gn(h,hb(0,0')) = (O, inv, ret, hb U (o, 0')) 


where h = (O, inv, ret, hb), and a is a call, ret, or hb action, and @ is not. An imple- 
mentation I is a history transition system, and the histories H (I) of I are those of its 
traces. Finally, we define consistency against specifications via history containment. 


Definition 3. Implementation I is consistent with specification W iff H(I) C H(W). 


3 Establishing Consistency with Forward Simulation 


To obtain a consistency proof strategy, we more closely relate implementations to 
specifications via their admitted abstract executions. To capture the abstract executions 
admitted by a given implementation, we consider SLTSs which expose not only history- 
related actions, but also actions witnessing linearization and visibility. We refer to 
the actions lin(o) and vis(o, 0’) for 0,0’ € N, along with the history actions, as the 
abstract-execution actions, and an abstract-execution transition system (AETS) is an SLTS 
whose actions include the abstract-execution actions. Extending the corresponding 
notion from history transition systems, we assume that executions are well formed in 
the sense that for a given operation identifier o: at most one lin o-action occurs, and no 
lin or vis o-actions occur prior to a call o-action. The abstract execution of a trace T is 
defined inductively by f.(eg,7), where eg = (hg, 0, Ø) is the empty execution, and, 


fe(e,e) =e Jele, â) = (ga (h), lin, vis) 
fele,aT) = felgele,a), T) ge(e,lin(o)) = (h, lin U {(0',0) : 0’ € lin}, vis) 
fele, a7) = fele, T) gele, vis(o, o')) = (h, lin, vis U { (0, 0’) }) 


where e = (h, lin, vis), and a is a call, ret, hb, lin, or vis action, @ is not, and @ is a 
call, ret, or hb action. A witnessing implementation I is an abstract-execution transition 
system, and the abstract executions E(I) of I are those of its traces. 

We adopt forward simulation for proving consistency against weak-visibility 
specifications. Formally, a simulation relation from one system X; = (Q1, A1, x1, 1) 
to another X2 = (Q2, Ae, X2, 2) is a binary relation R C Qı x Qe such that initial 
states are related, R(X1, X2), and: for any pair of related states R(q1, q2) and source- 
system transition qı oie q4, there exists a target-system transition q2 Sa, qh, to 
related states, i.e., R(q1, q4), over common actions, i.e., (@ | A2) = (@ | Ai). We say 
Xə simulates X; and write X1 C Xə when a simulation relation from X4 to Xə exists. 

We derive transition systems to model consistency specifications in simulation. The 


following lemma establishes the soundness and completeness of this substitution, and 
the subsequent theorem asserts the soundness of the simulation-based proof strategy. 
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Definition 4. The transition system |W], of a weak-visibility specification W is the 
AETS whose actions are the abstract execution actions, whose states are abstract executions, 
whose initial state is the empty execution, and whose transitions include e} “, e2 iff 
fe(€1, @) = e2 and ev is consistent with W. 


Lemma 2. A weak-visibility spec. and its transition system have identical histories. 


Theorem 1. A witnessing implementation I is consistent with a weak-visibility specifi- 
cation W if the transition system [W], of W simulates I. 


Our notion of simulation is in some sense complete when the sequential specifica- 
tion S of a weak-consistency specification W = (S, R, V} is return-value deterministic, 
i.e., there is a single label (m, x, y) such that X- (m, x,y) € S for any method m, 
argument-value x, and admitted sequence X€ S.In particular, [W], simulates any wit- 
nessing implementation 7 whose abstract executions (J) are included in E(W] JE] 
This completeness, however, extends only to inclusion of abstract executions, and not 
all the way to consistency, since consistency is defined on histories, and any given 
operation’s return value is not completely determined by the other operation labels 
and happens-before relation of a given history: return values generally depend on lin- 
earization order and visibility as well. Nevertheless, sequential specifications typically 
are return-value deterministic, and we have used simulation to prove consistency of 
Java-inspired weakly-consistent objects. 

Establishing simulation for an implementation is also helpful when reasoning 
about clients of a concurrent object. One can use the specification in place of the 
implementation and encode the client invariants using the abstract execution of the 
specification in order to prove client properties, following Sergey et al’s approach [35]. 


3.1 Reducing Consistency to Safety Verification 


Proving simulation between an implementation and its specification can generally be 
achieved via product construction: complete the transition system of the specification, 
replacing non-enabled transitions with error-state transitions; then ensure the synchro- 
nized product of implementation and completed-specification transition systems is safe, 
i.e., no error state is reachable. Assuming that the individual transition systems are 
safe, then the product system is safe iff the specification simulates the implementation. 
This reduction to safety verification is also generally applicable to implementation 
and specification programs, though we limit our formalization to their underlying 
transition systems for simplicity. By the upcoming Corollary [1] such reductions enable 
consistency verification with existing safety verification tools. 


3.2 Verifying Implementations 


While Theorem[1] establishes forward simulation as a strategy for proving the con- 
sistency of implementations against weak-visibility specifications, its application to 


" This is a consequence of a generic result stating that the set of traces of an LTS Aj is included 
in the set of traces of an LTS A2» iff Az simulates Ai, provided that Az is deterministic [25]. 
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real-world implementations requires program-level mechanisms to signal the underly- 
ing AETS lin and vis actions. To apply forward simulation, we thus develop a notion of 
programs whose commands include such mechanisms. 

This section illustrates a toy programming language with AETS semantics which 
provides these mechanisms. The key features are the lin and vis program commands, 
which emit linearization and visibility actions for the currently-executing operation, 
along with load, store, and cas (compare-and-swap) commands, which record and return 
the set of operation identifiers having written to each memory cell. Such augmented 
memory commands allow programs to obtain handles to the operations whose effects 
it has observed, in order to signal the corresponding vis actions. 

While one can develop similar mechanisms for languages with any underlying 
memory model, the toy language presented here assumes a sequentially-consistent 
memory. Note that the assumption of sequentially-consistent memory operations is 
practically without loss of generality for Java 8’s concurrent collections since they are 
designed to be data-race free — their anomalies arise not from weak-memory semantics, 
but from non-atomic operations spanning several memory cells. 

For generality, we assume abstract notions of commands and memory, using «K, 
bt, L, and M respectively to denote a program command, memory command, local 
state, and global memory. So that operations can assert their visibilities, we consider 
memory which stores, and returns upon access, the identifier(s) of operations which 
previously accessed a given cell. A program P = (init, cmd, idle, done) consists of an 
init(m, x) = ¢ function mapping method name m and argument values z to local state 
£, along with a cmd(¢) = « function mapping local state ¢ to program command k, 
and idle(¢) and done(¢) predicates on local states ¢. Intuitively, identifying local states 
with threads, the idle predicate indicates whether a thread is outside of atomic sections, 
and subject to interference from other threads; meanwhile the done predicate indicates 
whether whether a thread has terminated. 

The denotation of a memory command p is a function [u] „ from global memory 
My, argument value x, and operation o to a tuple [u] „ (M1, 2,0) = (Me, y) consisting 
of a global memory M2, along with a return value y. 


Example 6. A sequentially-consistent memory system which records the set of oper- 
ations to access each location can be captured by mapping addresses x to value and 
operation-set pairs M (x) = (y, O}, along with three memory commands: 


[load] „ (M, x,_) = (M, M(2)) 
[store], (M, zy, 0) = (M|x = ly, M(zx)ı U {0})], €) 


, _ | (M[x => (z, M(x)ı U {0})], (true, M(x)1)) if M(x)o = 
[cas],,(M, xyz, 0) = { (M, (false, M(2)1)) if M(a)o 4 : 


where the compare-and-swap (CAS) operation stores value z at address x and returns 
true when y was previously stored, and otherwise returns false. 


The denotation of a program command « is a function [x], from local state 41 to a 
tuple [«],.(4) = (u, x, f) consisting of a memory command p and argument value zx, 


290 S. Krishna et al. 


and a update continuation f mapping the memory command’s return value y to a pair 
f(y) = (42, a), where £2 is an updated local state, and a maps an operation o to an LTS 
action a(o). We assume the denotation [ret a] .(¢1) = (nop, €, Ay.(€2, Ao.ret(z))) of 
the ret command yields a local state ¢2 with done(£2) without executing memory 
commands, and outputs a corresponding LTS ret action. 


Example 7. A simple goto language over variables a, b, ... for the memory system of 
Example [6]would include the following commands: 


[goto a], (2) = (nop, ¢, Ay. (jump(, €(a)), do.e)) 
[assume al].(¢) = (nop, €, Ay. (next (£), ro.e)) if L(a) #0 
[b,c = load(a)].(€) = (load, &(a), Ayr, yo.(next(l[b yi] [¢ => yal), Ao.e)) 
[store(a, b)].(£) = (store, ¢(a)e(b), Ay.(next(£), Ao.€)) 
|d, e = cas(a, b, c)] (£) = (cas, E(a)e(b)&(c), Ay1, yo.(neat(E[d + yi ][e + y2]), Ao.€ 


where the jump and neat functions update a program counter, and the load command 
stores the operation identifier returned from the corresponding memory commands. 
Linearization and visibility actions are captured as program commands as follows: 


[Lin].(¢) = (nop, €, Ay.(nezt(£), Ao.lin(0))) 
[vis(a)],(2) = (nop, €, Ay.(next(@), Ao.vis(o, &(a)))) 


Atomic sections can be captured with a lock variable and a pair of program commands, 


[begin] .(¢) = (nop, £, Ay.(nert(€[lock ++ true]), Ao.€)) 
[end].(2) = (nop, £, Ay.(nezt(£[Lock ++ false]), Ao.e)) 


such that idle states are identified by not holding the lock, i.e., idle(¢) = 4¢(lock), as 
in the initial state init(m, x)(lock) = false. 


Figure}1|lists the semantics |P] „ of a program P as an abstract-execution transition 
system. The states (M, L) of [P], include a global memory M, along with a partial 
function L from operation identifiers o to local states L(o); the initial state is (Mg, 0), 
where Mg is an initial memory state. The transitions for call and hb actions are enabled 
independently of implementation state, since they are dictated by implementations’ 
environments. Although we do not explicitly model client programs and platforms 
here, in reality, client programs dictate call actions, and platforms, driven by client 
programs, dictate hb actions; for example, a client which acquires the lock released after 
operation 01, before invoking operation 09, is generally ensured by its platform that 01 
happens before o2. The transitions for all other actions are dictated by implementation 
commands. While the ret, lin, and vis commands generate their corresponding LTS 
actions, all other commands generate € transitions. 

Each atomic Z, step of the AETS underlying a given program is built from a 
sequence of ~> steps for the individual program commands in an atomic section. 
Individual program commands essentially execute one small ~> step from shared 
memory and local state (M1, l1) to (M2, l2), invoking memory command j with 


)) 
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o € dom(L) £ = init(m, x) done(L(01)) o2 ¢ dom(L) 
(M, p) Em2, IM, Llo = 4) (M Iy Seea i E) 


(Mi, 41,0,£) ~>* (Ma, 2, 0, ©) idle(£2) 
(My, Lo = &:]) & (Mo, Llo 4 &]) 


cmd(€1) = K [K] (41) = (u, x, f) 
laln (M1, x, 0) = (M2, y) f(y) = (2, a) 
(Mı, £1, 0, d@) ~ (M2, £2,0,a@- a(o)) 


Fig. 1. The semantics of program P = (init, cmd, idle, done) as an abstract-execution transition 
system, where [-]|. and [-],,, are the denotations of program and memory commands, respectively. 
argument x, and emitting action a(o). Besides its effect on shared memory, each step 
uses the result (M2, y) of memory command u to update local state and emit an action 
using the continuation f, i.e., f(y) = (¢2, a). Commands which do not access memory 
are modeled by a no-op memory commands. We define the consistency of programs by 
reduction to their transition systems. 


Definition 5. A program P is consistent with a specification iff its semantics [Pl], is. 


Thus the consistency of P with W amounts to the inclusion of [P], ’s histories 
in W’s. The following corollary of Theorem|1| follows directly by Definition [5] and 
immediately yields a program verification strategy: validate a simulation relation from 
the states of |P], to the states of [W], such that each command of P is simulated by 
a step of [W].. 


Corollary 1. A program P is consistent with specification W if |W], simulates |P]. 


4 Proof Methodology 


In this section we develop a systematic means to annotating concurrent objects for 
relaxed-visibility simulation proofs. Besides leveraging an auxiliary memory system 
which tags memory accesses with the operation identifiers which wrote read values 
(see 93.2), annotations signal linearization points with lin commands, and indicate 
visibility of other operations with vis commands. As in previous works we 
assume linearization points are given, and focus on visibility-related annotations. 

As we focus on data-race free implementations (e.g., Java 8’s concurrent collections) 
for which sequential consistency is sound, it can be assumed without loss of generality 
that the happens-before order is exactly the returns-before order between operations, 
which orders two operations 0, and 09 iff the return action of o; occurs in real-time 
before the call action of o2. This assumption allows to guarantee that linearizations are 
consistent with happens-before just by ensuring that the linearization point of each 
operation occurs in between its call and return action (like in standard linearizability). 
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var table: array of T; procedure monotonic has(v: T) 
vis(getModLin()) ; 
procedure absolute put(k: int, v: T) { { 
atomic { store(k, 0); 
store(table[k], v); while (k < table.length) { 
vis(getLin()); atomic{ 
lind; tv, O = load(table[k]); 
} vis(O M getModLin()); 
J J 
if (tv = v) then { 
procedure absolute get(k: int) { lind; 
atomic{ return true; 
v, O = load(table[k]) ; } 
vis(getLin()); inc(k); 
lin); } 
3 lind); 
return v; return false; 
3 } 


Fig. 2. An implementation [-hm modeling Java’s concurrent hash map. The command inc(k) 
increments counter k, and commands within atomic {...} are collectively atomic. 


It is without loss of generality because the clients of such implementations can use 
auxiliary variables to impose synchronization order constraints between every two 
operations ordered by returns-before, e.g., writing a variable after each operation 
returns which is read before each other operation is called (under sequential consistency, 
every write happens-before every other read which reads the written value). 


We illustrate our methodology with the key-value map implementation Ichm of 
Figure [2| which models Java’s concurrent hash map. The lines marked in blue and 
red represent linearization/visibility commands added by the instrumentation that 
will be described below. Key-value pairs are stored in an array table indexed by keys. 
The implementation of put and get are obvious while the implementation of has 
returns true iff the input value is associated to some key consists of a while loop 
traversing the array and searching for the input value. To simplify the exposition, the 
shared memory reads and writes are already adapted to the memory system described 
in Section [3.2] (essentially, this consists in adding new variables storing the set of 
operation identifiers returned by a shared memory read). While put and get are 
obviously linearizable, has is weakly consistent, with monotonic visibility. For instance, 
given the two thread program {get(1);has(1)} || {put(1, 1); put(0, 1); put(1, 0)} it 
is possible that get(1) returns 1 while has(1) returns false. This is possible in an 
interleaving where has reads table[0] before put (0,1) writes into it (observing the 
initial value 0), and table[1] after put (1,0) writes into it (observing value 0 as well). 
The only abstract execution consistent with the weakly-consistent contains-value map 
Wm (Example[2} which justifies these return values is given in Example|3| We show 
that this implementation is consistent with a simplification of the contains-value map 
Wm, without remove key operations, and where put operations return no value. 


Given an implementation 7, let L(I) be an instrumentation of J with program 
commands lin() emitting linearization actions. The execution of lin() in the context 
of an operation with identifier o emits a linearization action lin(o). We assume that £ (T) 
leads to well-formed executions (e.g., at most one linearization action per operation). 
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Example 8. For the implementation in Figure [2] the linearization commands of put 
and get are executed atomically with the store to table[k] in put and the load of 
table[k] in get, respectively. The linearization command of has is executed at any 
point after observing the input value v or after exiting the loop, but before the return. 
The two choices correspond to different return values and only one of them will be 
executed during an invocation. 


Given an instrumentation L(I), a visibility annotation V for I’s methods, and a 
read-only predicate R, we define a witnessing implementation V(L£(J)) according to 
a generic heuristic that depends only on V and R. This definition uses a program 
command getLin() which returns the set of operations in the current linearization 
sequence(!”|The current linearization sequence is stored in a history variable which 
is updated with every linearization action by appending the corresponding operation 
identifier. For readability, we leave this history variable implicit and omit the corre- 
sponding updates. As syntactic sugar, we use a command getModLin() which returns 
the set of modifiers (non read-only operations) in the current linearization sequence. 
To represent visibility actions, we use program commands vis(A) where A is a set 
of operation identifiers. The execution of vis(A) in the context of an operation with 
identifier o emits the set of visibility actions vis(o0, o”) for every operation o’ € A. 

Therefore, V(L(I)) extends the instrumentation L(I) with commands generating 
visibility actions as follows: 


— for absolute methods, each linearization command is preceded by vis(getLin()) 
which ensures that the visibility of an invocation includes all the predecessors in 
linearization order. This is executed atomically with lin(). 

- for monotonic methods, the call action is followed by vis(getModLin()) (and 
executed atomically with this command) which ensures that the visibility of each 
invocation is monotonic, and every read of a shared variable which has been written 
by a set of operations O is preceded by vis(O  getModLin()) (and executed 
atomically with this command). The latter is needed so that the visibility of such 
an invocation contains enough operations to explain its return value (the visibility 
command attached to call actions is enough to ensure monotonic visibilities). 


Example 9. The blue lines in Figure|2]demonstrate the visibility commands added by 
the instrumentation V(-) to the key-value map in Figure|2](in this case, the modifiers 
are put operations). The first visibility command in has precedes the procedure body 
to emphasize the fact that it is executed atomically with the procedure call. Also, note 
that the read of the array table is the only shared memory read in has. 


Theorem 2. The abstract executions of the witnessing implementation V(L(I)) are 
consistent with V and R. 


Proof. Let (h, lin, vis) be the abstract execution of a trace 7 of V(L(J)), and let o be 
an invocation in h of a monotonic method (w.r.t. V). By the definition of V, the call 
action of o is immediately followed in 7 by a sequence of visibility actions vis(o, 0’) 


12 We rely on retrieving the identifiers of currently-linearized operations. More complex proofs 
may also require inspecting, e.g., operation labels and happens-before relationships. 
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for every modifier o’ which has been already linearized. Therefore, any operation 
which has returned before o (i.e., happens-before o) has already been linearized and it 
will necessarily have a smaller visibility (w.r.t. set inclusion) because the linearization 
sequence is modified only by appending new operations. The instrumentation of 
shared memory reads may add more visibility actions vis(o0, _) but this preserves the 
monotonicity status of o’s visibility. The case of absolute methods is obvious. 


The consistency of the abstract executions of V(£(J)) with a given sequential 
specification S, which completes the proof of consistency with a weak-visibility speci- 
fication W = (S, R, V}, can be proved by showing that the transition system [W], of 
W simulates V(L(J)) (Theorem|i}. Defining a simulation relation between the two 
systems is in some part implementation specific, and in the following we demonstrate 
it for the key-value map implementation V(L(Ichm)). 

We show that [Wm], simulates implementation I,m. A state of enm in Figure [2] 
is a valuation of table and the history variable lin storing the current linearization 
sequence, and a valuation of the local variables for each active operation. Let ops(q) 
denote the set of operations which are active in an implementation state q. Also, for 
a has operation o € ops(q), let index(o) be the maximal index k of the array table 
such that o has already read table[k] and table[k] # v. We assume index(o) = —1 if 
o did not read any array cell. 


Definition 6. Let Renm be a relation which associates every implementation state q with 
a state of [Wm] ie., an (S, R, V)-consistent abstract execution e = (h, lin, vis) with 


h = (O, inv, ret, hb), such that: 


m 


. O is the set of identifiers occurring in ops(q) or the history variable lin, 
. for each operation o € ops(q), inv(o) is defined according to its local state, ret(o) is 
undefined, and o is maximal in the happens-before order hb, 
3. the value of the history variable lin in q equals the linearization sequence lin, 
4. every invocationo € ops(q) of an absolute method (put or get) has absolute visibility 
if linearized, otherwise, its visibility is empty, 
. table is the array obtained by executing the sequence of operations lin, 
. for every linearized get(k) operation o € ops(q), the put(k, _) operation in vis(o) 
which occurs last in lin writes v to key k, where v is the local variable of o, 
7. for every has operation o E€ ops(q), vis(o) consists of: 
- all the put operations o' which returned before o was invoked, 
- for eachi < index(o), all the put(i,_) operations from a prefix of lin that 
wrote a value different from v, 
- all the put (index(o) + 1,_) operations from a prefix of lin that ends with a 
put (index(o) + 1,v) operation, provided that tv = v. 
Above, the linearization prefix associated to an index jı < jz should be a prefix of 
the one associated to j2. 


N 


nan 


A large part of this definition is applicable to any implementation, only points (5), 
(6), and (7) being specific to the implementation we consider. The points (6) and (7| 
ensure that the return values of operations are consistent with S and mimic the effect 
of the vis commands from Figure|2| 


Theorem 3. Rehm is a simulation relation from V(L(Ichm)) to [Wm]; 
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5 Implementation and Evaluation 


In this section we effectuate our methodology by verifying two weakly-consistent 
concurrent objects: Java’s ConcurrentHashMap and ConcurrentLinkedQueue|"’]We 
use an off-the-shelf deductive verification tool called crv [16], though any concurrent 
program verifier could suffice. We chose crv because comparable verifiers either 
require a manual encoding of the concurrency reasoning (e.g. Dafny or Viper) which 
can be error-prone, or require cumbersome reasoning about interleavings of thread- 
local histories (e.g. VerCors). An additional benefit of crv is that it directly proves 
simulation, thereby tying the mechanized proofs to our theoretical development. Our 
proofs assume no bound on the number of threads or the size of the memory. 

Our use of cIvL imposes two restrictions on the implementations we can verify. 
First, cIıvL uses the Owicki-Gries method to verify concurrent programs. These 
methods are unsound for weak memory models [22], so crvt, and hence our proofs, 
assume a sequentially-consistent memory model. Second, crvu’s strategy for building 
the simulation relation requires implementations to have statically-known linearization 
points because it checks that there exists exactly one atomic section in each code path 
where the global state is modified, and this modification is simulated by the specification. 

Given these restrictions, we can simplify our proof strategy of forward refinement 
by factoring the simulations we construct through an atomic version of the specification 
transition system. This atomic specification is obtained from the specification AETS 
[W]; by restricting the interleavings between its transitions. 


Definition 7. The atomic transition system of a specification W is the AETS [W], = 
(Q, A, q, >a), where [W], = (Q, A, q, +) is the AETS of W ande: La ifand only if 
e1 > ez and@ € {call(o, m, x) }U{ret(o, y) }U{hb(o, 0’) }U{ a; lin(o) : a € {vis(o,_)}*}. 


Note that the language of |W], is included in the language of [W], and simulation 
proofs towards [W], apply to [W], as well. 

Our civ1 proofs show that there is a simulation from an implementation to its atomic 
specification, which is encoded as a program whose state consists of the components 
of an abstract execution, i.e., (O, inv, ret, hb, lin, vis). These were encoded as maps 
from operation identifiers to values, sequences of operation identifiers, and maps from 
operation identifiers to sets of operation identifiers respectively. Our axiomatization 
of sequences and sets were adapted from those used by the Dafny verifier [23]. For 
each method in M, we defined atomic procedures corresponding to call actions, return 
actions, and combined visibility and linearization actions in order to obtain exactly the 
atomic transitions of [W],. 

It is challenging to encode Java implementations faithfully in crvt, as the latter’s 
input programming language is a basic imperative language lacking many Java features. 
Most notable among these is dynamic memory allocation on the heap, used by almost 
all of the concurrent data structure implementations. As cIv1 is a first-order prover, 
we needed an encoding of the heap that lets us perform reachability reasoning on the 


Our verified implementations are open source, and available at: 


https://github.com/siddharth-krishna/weak- consistency- proofs 
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heap. We adapted the first-order theory of reachability and footprint sets from the 
GRASShopper verifier for dynamically allocated data structures. This fragment is 
decidable, but relies on local theory extensions [36], which we implemented by using 
the trigger mechanism of the underlying SMT solver to ensure that quantified 
axioms were only instantiated for program expressions. For instance, here is the “cycle” 
axiom that says that if a node x has a field f [x] that points to itself, then any y that 
it can reach via that field (encoded using the between predicate Btwn(f, x, y, y)) 
must be equal to x: 


axiom (forall f: [Ref]Ref, x: Ref, y:Ref :: {known(x), known(y)} 
f[x] == x && Btwn(f, x, y, y) ==> x == y); 


We use the trigger known(x) , known(y) (known is a dummy function that maps every 
reference to true) and introduce known(t) terms in our programs for every term t of 
type Ref (for instance, by adding assert known(t) to the point of the program where 
t is introduced). This ensures that the cycle axiom is only instantiated for terms that 
appear in the program, and not for terms that are generated by instantations of axioms 
(like f [x] in the cycle axiom). This process was key to keeping the verification time 
manageable. 

Since we consider fine-grained concurrent implementations, we also needed to 
reason about interference by other threads and show thread safety. clvL provides 
Owicki-Gries style thread-modular reasoning, by means of demarcating atomic 
blocks and providing preconditions for each block that are checked for stability under 
all possible modifications by other threads. One of the consequences of this is that 
these annotations can only talk about the local state of a thread and the shared global 
state, but not other threads. To encode facts such as distinctness of operation identifiers 
and ownership of unreachable nodes (e.g. newly allocated nodes) in the shared heap, 
we use CIvL’s linear type system [40]. 

For instance, the proof of the push method needs to make assertions about the value 
of the newly-allocated node x. These assertions would not be stable under interference 
of other threads if we didn’t have a way of specifying that the address of the new node 
is known only by the push thread. We encode this knowledge by marking the type of 
the variable x as linear — this tells crv that all values of x across all threads are distinct, 
which is sufficient for the proof. crvL ensures soundness by making sure that linear 
variables are not duplicated (for instance, they cannot be passed to another method 
and then used afterwards). 

We evaluate our proof methodology by considering models of two of Java’s weakly- 
consistent concurrent objects. 


Concurrent Hash Map One is the ConcurrentHashMap implementation of the Map 
ADT, consisting of absolute put and get methods and a monotonic has method that 
follows the algortihm given in Figure[2| For simplicity, we assume here that keys are 
integers and the hash function is identity, but note that the proof of monotonicity of 
has is not affected by these assumptions|""] 


14 Our crv1 implementation assumes the hash function is injective to avoid reasoning about the 
dynamic bucket-list needed to resolve hash collisions. While such reasoning is possible within 
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Module Code Proof Total Time (s) 
Sets and Sequences - 85 85 - 
Executions and Consistency - 30 30 7 
Heap and Reachability - 35 35 - 
Map ADT 51 34 85 s 
Array-map implementation 138 175 313 6 
Queue ADT 50 22 72 - 
Linked Queue implementation 280 325 605 13 


Fig. 3. Case study detail: for each object we show lines of code, lines of proof, total lines, and 
verification time in seconds. We also list common definitions and axiomatizations separately. 


CIVL can construct a simulation relation equivalent to the one defined in Definition|6] 
automatically, given an inductive invariant that relates the state of the implementation 
to the abstract execution. A first attempt at an invariant might be that the value stored 
at table[k] for every key k is the same as the value returned by adding a get operation 
on k by the specification AETS. This invariant is sufficient for crv1 to prove that the 
return value of the absolute methods (put and get) is consistent with the specification. 

However, it is not enough to show that the return value of the monotonic has 
method is consistent with its visibility. This is because our proof technique constructs 
a visibility set for has by taking the union of the memory tags (the set of operations 
that wrote to each memory location) of each table entry it reads, but without additional 
invariants this visibility set could entail a different return value. We thus strengthen 
the invariant to say that tableTags[k], the memory tags associated with hash table 
entry k, is exactly the set of linearized put operations with key k. A consequence of 
this is that the abstract state encoded by tableTags[k] has the same value for key k as 
the value stored at table[k]. crvz can then prove, given the following loop invariant, 
that the value returned by has is consistent with its visibility set. 


(forall i: int :: @ <= i && i < k ==> Map.ofVis(my_vis, lin)[i] != v) 


This loop invariant says that among the entries scanned thus far, the abstract map 
given by the projection of lin to the current operation’s visibility my_vis does not 
include value v. 


Concurrent Linked Queue Our second case study is the ConcurrentLinkedQueue 
implementation of the Queue ADT, consisting of absolute push and pop methods and 
a monotonic size method that traverses the queue from head to tail without any locks 
and returns the number of nodes it sees (see Figure [4]for the full code). We again model 
the core algorithm (the Michael-Scott queue [26]) and omit some of Java’s optimizations, 
for instance to speed up garbage collection by setting the next field of popped nodes 
to themselves, or setting the values of nodes to null when popping values. 

The invariants needed to verify the absolute methods are a straightforward combi- 
nation of structural invariants (e.g. that the queue is composed of a linked list from 
the head to null, with the tail being a member of this list) and a relation between the 


CIVL, see our queue case study, this issue is orthogonal to the weak-consistency reasoning 
that we study here. 
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var head, tail: Ref; struct Node { var data: K; var next: Ref; } 


procedure absolute push(k: K) { procedure absolute pop() { procedure monotonic size() 
x = new Node(k, null); while (true) { vis(getModLin()) ; 
while (true) { h, _ = load(head) ; £ 
t, _ = load(tail); t, _ = load(tail); store(s, Q); 
tn, _ = load(tail.next); hn, _ = load(h.next); c, _ = load(head) ; 
if (tn == null) { if (h !=t) { atomic { 
atomic { k, _ = load(hn.data); cn, O0 = load(c.next); 
b, _ = cas(t.next, tn, x); atomic { vis(O M getModLin()); 
if (b) { b, _ = cas(head, h, hn); } 
vis(getLin()); if (b) { while (cn != null) { 
lind); vis(getLin()); inc(s); 
} lind); c= cn; 
} atomic { 
if (b) then break; } cn, 0 = load(c.next); 
if (b) then return k; vis(0 N getModLin()); 
} else { } } 
b, _ = cas(tail, t, tn); } } 
3 3 lind); 
} return s; 
3 } 


Fig. 4. The simplified implementation of Java’s ConcurrentLinkedQueue that we verify. 


abstract and concrete states. Once again, we need to strengthen this invariant in order 
to verify the monotonic size method, because otherwise we cannot prove that the 
visibility set we construct (by taking the union of the memory tags of nodes in the list 
during traversal) justifies the return value. 

The key additional invariant is that the memory tags for the next field of each node 
(denoted x.nextTags for each node x) in the queue contain the operation label of the 
operation that pushed the next node into the queue (if it exists). Further, the sequence 
of push operations in lin are exactly the operations in the nextTags field of nodes in 
the queue, and in the order they are present in the queue. 

Figure [5]shows a simplified version of the crv encoding of these invariants. In 
it, we use the following auxiliary variables in order to avoid quantifier alternation: 
nextInvoc maps nodes to the operation label (type Invoc in cīvL) contained in the 
nextTags field; nextRef maps operations to the nodes whose nextTags field contains 
them, i.e. it is the inverse of nextInvoc; and absRefs maps the index of the abstract 
queue (represented as a mathematical sequence) to the corresponding concrete heap 
node. We omit the triggers and known predicates for readability; the full invariant can 
be found in the accompanying proof scripts. 

Given these invariants, one can show that the return value s computed by size 
is consistent with the visibility set it constructs by picking up the memory tags from 
each node that it traverses. The loop invariant is more involved, as due to concurrent 
updates size could be traversing nodes that have been popped from the queue; see 
our CIVL proofs for more details. 


Results Figure |3|provides a summary of our case studies. We separate the table into 
sections, one for each case study, and a common section at the top that contains the 
common theories of sets and sequences and our encoding of the heap. In each case study 
section, we separate the definitions of the atomic specification of the ADT (which can 
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// nextTags only contains singleton sets of push operations 
(forall y: Ref :: 
(Btwn(next, start, y, null) && y != null && nextLy] != null 
==> nextTagsLy] == Set(nextInvocL[y]) 
&& invoc_m(nextInvoc[y]}) == Queue.push)) 


// nextTags of the last node is the empty set 
&& nextTagsLabsRefs[Queue. stateTail (Queue.ofSeq(lin)) - 1]] 
== Set_empty() 


// lin is made up of nextInvocLy] for y in the queue 
&& (forall n: Invoc :: invoc_m(n) == Queue.push 
==> (Seq_elem(n, lin) 
<==> Btwn(next, start, nextRef[n], null) 
&& nextRef[n] != null && next[nextRef[n]] != null)) 


// lin is ordered by order of nodes in queue 
&& (forall n1, n2: Invoc :: 
Cinvoc_m(n1) == Queue.push && invoc_m(n2) == Queue.push 
&& Seq_elem(n1, lin) && Seq_elem(n2, lin) 
==> (Seq_ord(lin, n1, n2) 
<==> Btwn(next, nextRef[n1], nextRef[n1], nextRef[n2]) 
&& nextRef[n1] != nextRef[n2]))) 


Fig. 5. A snippet from the civt invariant for the queue. 


be reused for other implementations) from the code and proof of the implementation 
we consider. For each resulting module, we list the number of lines of code, lines of 
proof, total lines, and cīvr’s verification time in seconds. Experiments were conducted 
on an Intel Core i7-4470 3.4 GHz 8-core machine with 16GB RAM. 

Our two case studies are representative of the weakly-consistent behaviors exhibited 
by all the Java concurrent objects studied in [13], both those using fixed-size arrays 
and those using dynamic memory. As civ does not direclty support dynamic memory 
and other Java language features, we were forced to make certain simplifications 
to the algorithms in our verification effort. However, the assumptions we make are 
orthogonal to the reasoning and proof of weak consistency of the monotonic methods. 
The underlying algorithm used by, and hence the proof argument for monotonicity 
of, hash map’s has method is the same as that in the other monotonic hash map 
operations such as elements, entrySet, and toString. Similarly, the argument used 
for the queue’s size can be adapted to other monotonic ConcurrentLinkedQueue 
and LinkedTransferQueue operations like toArray and toString. Thus, our proofs 
carry over to the full versions of the implementations as the key invariants linking the 
memory tags and visibility sets to the specification state are the same. 

In addition, crvt does not currently have any support for inferring the preconditions 
of each atomic block, which currently accounts for most of the lines of proof in our case 
studies. However, these problems have been studied and solved in other tools [B0][39], 
and in theory can be integrated with crvt in order to simplify these kinds of proofs. 
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In conclusion, our case studies show that verifying weakly-consistent operations 
introduces little overhead compared to the proofs of the core absolute operations. The 
additional invariants needed to prove monotonicity were natural and easy to construct. 
We also see that our methodology brings weak-consistency proofs within the scope of 
what is provable by off-the-shelf automated concurrent program verifiers in reasonable 
time. 


6 Related Work 


Though linearizability has reigned as the de-facto concurrent-object consistency 
criterion, several recent works proposed weaker criteria, including quantitative re- 
laxation [17], quiescent consistency [10], and local linearizability [14]; these works 
effectively permit externally-visible interference among threads by altering objects’ se- 
quential specifications, each in their own way. Motivated by the diversity of these 
proposals, Sergey et al. proposed the use of Hoare logic for describing a custom 
consistency specification for each concurrent object. Raad et al. continued in this 
direction by proposing declarative consistency models for concurrent objects atop 
weak-memory platforms. One common feature between our paper and this line of 
work (see also [21][9]]) is encoding and reasoning directly about the concurrent history. 
The notion of visibility relaxation originates from Burckhardt et als axiomatic 
specifications [7], and leverages traditional sequential specifications by allowing certain 
operations to behave as if they are unaware of concurrently-executed linearization- 
order predecessors. The linearization (and visibility) actions of our simulation-proof 
methodology are unique to visibility-relaxation based weak-consistency, since they 
refer to a global linearization order linking executions with sequential specifications. 


Typical methodologies for proving linearizability are based on reductions to safety 
verification and forward simulation [2], the latter generally requiring 
the annotation of per-operation linearization points, each typically associated with 
a single program statement in the given operation, e.g., a shared memory access. 
Extensions to this methodology include cooperation [38]T2]41], i.e., allowing operations’ 
linearization points to coincide with other operations’ statements, and prophecy [33]|24], 
i.e., allowing operation’ linearization points to depend on future events. Such extensions 
enable linearizability proofs of objects like the Herlihy-Wing Queue (HWQ). While 
prophecy [25], alternatively backward simulation [25], is generally more powerful 
than forward simulation alone, Bouajjani et al. [6] described a methodology based on 
forward simulation capable of proving seemingly future-dependent objects like HWQ 
by considering fixed linearization points only for value removal, and an additional 
kind of specification-simulated action, commit points, corresponding to operations’ 
final shared-memory accesses. Our consideration of specification-simulated visibility 
actions follows this line of thinking, enabling the forward-simulation based proof of 
weakly-consistent concurrent objects. 
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7 Conclusion and Future Work 


This work develops the first verification methodology for weakly-consistent operations 
using sequential specifications and forward simulation, thus reusing existing sequential 
ADT specifications and enabling simple reasoning, i.e., without prophecy [1] or back- 
ward simulation [25]. This paper demonstrates the application of our methodology to 
absolute and monotonic methods on sequentially-consistent memory, as these are the 
consistency levels demonstrated in actual Java implementations of which we are aware. 
Our formalization is general, and also applicable to the other visibility relaxations, 
e.g., the peer and weak visibilities [13], and weaker memory models, e.g., the Java 
memory model. 

Extrapolating, we speculate that handling other visibilities amounts to adding anno- 
tations and auxiliary state which mirrors inter-operation communication. For example, 
while monotonic operations on shared-memory implementations observe mutating 
linearization-order predecessors — corresponding to a sequence of shared-memory up- 
dates — causal operations with message-passing based implementations would observe 
operations whose messages have (transitively) propagated. The corresponding anno- 
tations may require auxiliary state to track message propagation, similar in spirit to 
the getModLin() auxiliary state that tracks mutating linearization-order predecessors 
(g4). Since weak memory models essentially alter the mechanics of inter-operation 
communication, the corresponding visibility annotations and auxiliary state may simi- 
larly reflect this communication. Since this communication is partly captured by the 
denotations of memory commands (§3.2| , these denotations would be modified, e.g., to 
include not one value and tag per memory location, but multiple. While variations are 
possible depending on the extent to which the proof of a given implementation relies 
on the details of the memory model, in the worst case the auxiliary state could capture 
an existing memory model (e.g., operational) semantics exactly. 

As with systematic or automated linearizability-proof methodologies, our proof 
methodology is susceptible to two potential sources of incompleteness. First, as men- 
tioned in Section [3] methodologies like ours based on forward simulation are only 
complete when specifications are return-value deterministic. However, data types are 
typically designed to be return-value deterministic and this source of incompleteness 
does not manifest in practice. 

Second, methodologies like ours based on annotating program commands, e.g., with 
linearization points, are generally incomplete since the consistency mechanism em- 
ployed by any given implementation may not admit characterization according to a 
given static annotation scheme; the Herlihy-Wing Queue, whose linearization points 
depend on the results of future actions, is a prototypical example [18]. Likewise, our 
systematic strategy for annotating implementations with lin and vis commands (§3| 
can fail to prove consistency of future-dependent operations. However, we have yet 
to observe any practical occurrence of such exotic objects; our strategy is sufficient 
for verifying the weakly-consistent algorithms implemented in the Java development 
kit. As a theoretical curiosity for future work, investigating the potential for complete 
annotation strategies would be interesting, e.g., for restricted classes of data types 
and/or implementations. 
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Finally, while crvi’s high-degree of automation facilitated rapid prototyping of 
our simulation proofs, its underlying foundation using Owicki-Gries style proof rules 
limits the potential for modular reasoning. In particular, while our weak-consistency 
proofs are thread-modular, our invariants and intermediate assertions necessarily talk 
about state shared among multiple threads. Since our simulation-based methodology 
and annotations are completely orthogonal to the underlying program logic, it would 
be interesting future work to apply our methodology using expressive logics like Rely- 
Guarantee, e.g. [19] [38], or variations of Concurrent Separation Logic, e.g. 
{35} [4] [20]. It remains to be seen to what degree increased modularity may sacrifice 
automation in the application of our weak-consistency proof methodology. 
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A Appendix: Proofs to Theorems and Lemmas 


Lemma({1| The abstract executions E(W) of a specification W are consistent with W. 


Proof: Any complete, sequential, and absolute execution is consistent by definition, 
since the labeling of its linearization is taken from the sequential specification. Then, 
any happens-before weakening is consistent for exactly the same reason as its source 
execution, since its linearization and visibility projection are both identical. Finally, any 
visibility weakening is consistent by the condition of W-consistency in its definition. 


Lemma[2] A weak-visibility specification and its transition system have identical histo- 
ries. 


Proof. It follows almost immediately that the abstract executions of [W], are identical 
to those of W, since [W]; s state effectively records the abstract execution of a given 
AETS execution, and only enables those returns that are consistent with W. Since 
histories are the projections of abstract executions, the corresponding history sets are 
also identical. 


Theorem|[]| A witnessing implementation I is consistent with a weak-visibility specifi- 
cation W if the transition system |W], of W simulates I. 


Proof. This follows from standard arguments, given that the corresponding SLTSs 
include € transitions to ensure that every move of one system can be matched by 
stuttering from the other: since both systems synchronize on the call, ret, hb, lin, and 
vis actions, the simulation guarantees that every abstract execution, and thus history, 
of I is matched by one of [W]. Then by Lemmal2| the histories of J are included in 
W. 


Theorem|3} Rehm is a simulation relation from Ichm to [Wm]; 


Proof Sketch. We show that every step of the implementation, i.e., an atomic section 
or a program command, is simulated by [Wm]. Given (q, e) € Rehm, we consider the 
different implementation steps which are possible in q. 

The case of commands corresponding to procedure calls of put and get is trivial. 


Executing a procedure call in q leads to a new state q’ which differs only by having 


: A l(o,_,_ 
a new active operation o. We have that e ion, e’ and (q',e') € Ronm where e’ 


is obtained from e by adding o with an appropriate value of inv(o) and an empty 
visibility. 

The transition corresponding to the atomic section of put is labeled by a sequence 
of visibility actions (one for each linearized operation) followed by a linearization 
action. Let ø denote this sequence of actions. This transition leads to a state q’ where 
the array table may have changed (unless writing the same value), and the history 
variable lin is extended with the put operation o executing this step. We define an 
abstract execution e’ from e by changing lin to the new value of lin, and defining an 
absolute visibility for o. We have that e Z e! because e’ is consistent with Wm. Also, 


(q’,e') E€ Renm because the validity of (3), (4), and (5) follow directly from the definition 
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of e’. The atomic section of get can be handled in a similar way. The simulation of 
return actions of get operations is a direct consequence of point {6} which ensures 
consistency with S. 

For has, we focus on the atomic sections containing vis commands and the lin- 
earization commands (the other internal steps are simulated by € steps of [Wy], and 
the simulation of the return step follows directly from (7) which justifies the consis- 
tency of the return value). The atomic section around the procedure call corresponds 
to a transition labeled by a sequence o of visibility actions (one for each linearized 
modifier) and leads to a state q’ with a new active has operation o (compared to q). 
We have that e > e’ because e’ is consistent with Wm. Indeed, the visibility of o in 
e’ is not constrained since o has not been linearized and the W,,-consistency of e’ 
follows from the W,,-consistency of e. Also, (q’,e’) E€ Rehm because index(o) = —1 
and (7) is clearly valid. The atomic section around the read of table[k] is simulated 
by [Wm]; in a similar way, noticing that (7) models precisely the effect of the visibility 
commands inside this atomic section. For the simulation of the linearization commands 
is important to notice that any active has operation in e has a visibility that contains 
all modifiers which returned before it was called and as explained above, this visibility 
is monotonic. 
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Abstract. Separation logics are widely used for verifying programs that manipu- 
late complex heap-based data structures. These logics build on so-called separation 
algebras, which allow expressing properties of heap regions such that modifica- 
tions to a region do not invalidate properties stated about the remainder of the heap. 
This concept is key to enabling modular reasoning and also extends to concurrency. 
While heaps are naturally related to mathematical graphs, many ubiquitous graph 
properties are non-local in character, such as reachability between nodes, path 
lengths, acyclicity and other structural invariants, as well as data invariants which 
combine with these notions. Reasoning modularly about such graph properties 
remains notoriously difficult, since a local modification can have side-effects on a 
global property that cannot be easily confined to a small region. 

In this paper, we address the question: What separation algebra can be used to 
avoid proof arguments reverting back to tedious global reasoning in such cases? 
To this end, we consider a general class of global graph properties expressed as 
fixpoints of algebraic equations over graphs. We present mathematical foundations 
for reasoning about this class of properties, imposing minimal requirements on the 
underlying theory that allow us to define a suitable separation algebra. Building 
on this theory, we develop a general proof technique for modular reasoning about 
global graph properties expressed over program heaps, in a way which can be 
directly integrated with existing separation logics. To demonstrate our approach, 
we present local proofs for two challenging examples: a priority inheritance 
protocol and the non-blocking concurrent Harris list. 


1 Introduction 


Separation logic (SL) provides the basis of many successful verification tools that 
can verify programs manipulating complex data structures [[I|/4|[17|[29]. This success is 
due to the logic’s support for reasoning modularly about modifications to heap-based data. 
For simple inductive data structures such as lists and trees, much of this reasoning can 
be automated (2|[17][20)33). However, these techniques often fail when data structures 
are less regular (e.g. multiple overlaid data structures) or provide multiple traversal 
patterns (e.g. threaded trees). Such idioms are prevalent in real-world implementations 
such as the fine-grained concurrent data structures found in operating systems and 
databases. Solutions to these problems have been proposed but remain difficult to 
automate. For proofs of general graph algorithms, the situation is even more dire. Despite 
substantial improvements in the verification methodology for such algorithms (35][38}, 
significant parts of the proof argument still typically need to be carried out using non- 
local reasoning {7||8||13|/25]. This paper presents a general technique for local reasoning 
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1 method acquire(p: Node, r: Node) { 


if (r.next == null) { 

r.next := p; update(p, -1, r.curr_prio) 
} else { 

p.next := r; update(r, -1y p.curr_prio) 


} 


2 
3 

4 

5 

6 3 
7 

8 method update (n: Node, from: Int, to: Int) { 
9 


n.prios := n.prios \ {from} 
10 if (to >= 0) n.prios := n.prios U {to} 
11 from := n.curr_prio 
12 n.curr_prio := max(n.prios U {n.def_prio}) 
13 to := n.curr_prio; 
14 if (from != to && n.next != null) { 
15 update (n.next, from, to) 


16} 


Fig. 1: Pseudocode of the PIP and a state of the protocol data structure. Round nodes 
represent processes and rectangular nodes resources. Nodes are marked with their default 
priorities def_prio as well as the aggregate priority multiset prios. A node’s current 
priority curr_prio is underlined and marked in bold blue. 


about global graph properties that can be used within off-the-shelf separation logics. 
We demonstrate our technique using two challenging examples for which no fully local 
proof existed before, respectively, whose proof required a tailor-made logic. 

As a motivating example, we consider an idealized priority inheritance protocol (PIP), 
a technique used in process scheduling (39). The purpose of the protocol is to avoid 
priority inversion, i.e. a situation where a low-priority process causes a high-priority 
process to be blocked. The protocol maintains a bipartite graph with nodes representing 
processes and resources. An example graph is shown in Fig. [I] An edge from a process 
p to a resource r indicates that p is waiting for r to be available whereas an edge in 
the other direction means that r is currently held by p. Every node has an associated 
default priority and current; these are natural numbers. The current priority is used for 
scheduling processes. When a process attempts to acquire a resource currently held by 
another process, the graph is updated to avoid priority inversion. For example, when 
process pı with current priority 3 attempts to acquire the resource rı held by process 
p2 of priority 1, p;’s higher priority is propagated to pz and, transitively, to any other 
process that pə is waiting for (ps in this case). As a result, all nodes on the created cycld>] 
will get current priority 3. The protocol maintains the following invariant: the current 
priority of each node is the maximum of its default priority and the current priorities of 
all its predecessors. Priority propagation is implemented by the method update shown 
in Fig[I] The implementation represents graph edges by next pointers and handles both 
adding an edge (acquire) and removing one (release - code omitted). To recalculate 
the current priority of a node (line[12}, each node maintains its default priority def_prio 
and a multiset prios which contains the priorities of all its immediate predecessors. 

Verifying that the PIP maintains its invariant using established separation logic (SL) 
techniques is challenging. In general, SL assertions describe resources and express the 
fact that the program has permission to access and manipulate these resources. In what 


> The cycle can be used to detect/handle a deadlock; this is not the concern of this data structure. 
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follows, we stick to the standard model of SL where resources are memory regions 
represented as partial heaps. We sometimes view partial heaps more abstractly as partial 
graphs (hereafter, simply graphs). Assertions describing larger regions are built from 
smaller ones using separating conjunction, $1 * 2. Semantically, the x operator is tied to 
a notion of resource composition defined by an underlying separation algebra (5|(6). In 
the standard model, composition enforces that ġı and ¢2 must describe disjoint regions. 
The logic and algebra are set up so that changes to the region ¢; do not affect ġə (and 
vice versa). That is, if 6; * 2 holds before the modification and ¢; is changed to ¢4, 
then %1 * 2 holds afterwards. This so-called frame rule enables modular reasoning 
about modifications to the heap and extends well to the concurrent setting when threads 
operate on disjoint portions of memory [3}/9|{10|[36]. However, the mere fact that ¢2 is 
preserved by modifications to ¢; does not guarantee that if a global property such as the 
PIP invariant holds for ¢1 * ¢g, it also still holds for 44 * do. 

For example, consider the PIP scenario depicted in Fig. |1| If ¢; describes the 
subgraph containing only node pı, ¢2 the remainder of the graph, and ¢/ the graph 
obtained from $1 by adding the edge from pı to rj, then the PIP invariant will no longer 
hold for the new composed graph described by ¢/ * 2. On the other hand, if ġı captures 
pı and the nodes reachable from rı (i.e., the set of nodes modified by update), 2 the 
remainder of the graph, and we reestablish the PIP invariant locally in ¢; obtaining ¢ 
(i.e., run update to completion), then ¢/, * @2 will also globally satisfy the PIP invariant. 
The separating conjunction * is not sufficient to differentiate these two cases; both 
describe valid partitions of a possible program heap. As a consequence, prior techniques 
have to revert back to non-local reasoning to prove that the invariant is maintained. 

A first helpful idea towards a solution to this problem is that of iterated separating 
conjunction BOKA, which describes a graph G consisting of a set of nodes X by a 
formula ¥ = X „ex N(x) where N(x) is some predicate that holds locally for every 
node x € X. Using such node-local conditions one can naturally express non-inductive 
properties of graphs (e.g. “G has no outgoing edges” or “G is bipartite”). The advan- 
tages of this style of specification are two-fold. First, one can arbitrarily decompose 
and recompose W by splitting X into disjoint subsets. For example, if X is partitioned 
into X; and Xo, then W is equivalent to X pe x, N(x) * X pe x, N(x). Moreover, it is 
very easy to prove that W is preserved under modifications of subgraphs. For instance, 
if a program modifies the subgraph induced by X; such that Æ „e x, N(x) is preserved 
locally, then the frame rule guarantees that W will be preserved in the new larger graph. 
Iterated separating conjunction thus yields a simple proof technique for local reasoning 
about graph properties that can be described in terms of node-local conditions. However, 
this idea alone does not actually solve our problem because general global graph proper- 
ties such as “G is a direct acyclic graph”, “G is an overlay of multiple trees”, or “G 
satisfies the PIP invariant” cannot be directly described via node-local conditions. 


Solution. The key ingredient of our approach is the concept of a flow of a graph: a 
function ff from the nodes of the graph to flow values. For the PIP, the flow maps 
each node to the multiset of its incoming priorities. In general, a flow is a fixpoint of 
a set of algebraic equations induced by the graph. These equations are defined over a 
flow domain, which determines how flow values are propagated along the edges of the 
graph and how they are aggregated at each node. In the PIP example, an edge between 
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nodes (n, n’) propagates the multiset containing max(fl(n),n.def_prio) from n to 
n’. The multisets arriving at n’ are aggregated with multiset union to obtain f(n’). 
Flows enable capturing global graph properties in terms of node-local conditions. For 
example, the PIP invariant can be expressed by the following node-local condition: 
n.curr_prio = max(fl(n),n.def_prio). To enable compositional reasoning about 
such properties we need an appropriate separation algebra allowing us to prove locally 
that modifications to a subgraph do not affect the flow of the remainder of the graph. 

To this end, we make the useful observation that a separation algebra induces a 
notion of an interface of a resource: we say that two resources a and a’ are equivalent 
if they compose with the same resources. The interface of a resource a could then be 
defined as a’s equivalence class, but more-succinct and simpler representations may be 
possible. In the standard model of SL where resources are graphs and composition is 
disjoint graph union, the interface of a graph G is the set of all graphs G’ that have the 
same domain as G; in this model, a graph’s domain could be defined to be its interface. 

The interfaces of resources described by assertions capture the information that is 
implicitly communicated when these assertions are conjoined by separating conjunction. 
As we discussed earlier, in the standard model of SL, this information is too weak to 
enable local reasoning about global properties of the composed graphs because some 
additional information about the subgraphs’ structure other than which nodes they 
contain must be communicated. For instance, if the goal is to verify the PIP invariant, the 
interfaces must capture information about the multisets of priorities propagated between 
the subgraphs. We define a separation algebra achieving exactly this: the induced flow 
interface of a graph G in this separation algebra captures how values of the flow domain 
must enter and leave G such that, when composed with a compatible graph G”, the 
imposed local conditions on the flow of each node are satisfied in the composite graph. 

This is the key to enabling SL-style framing for global graph properties. Using iter- 
ated separating conjunctions over the new separation algebra, we obtain a compositional 
proof technique that yields succinct proofs of programs such as the PIP, whose proofs 
with existing techniques would involve non-trivial global reasoning steps. 


Contributions. In §2| we present mathematical foundations for flow domains, imposing 
the minimal requirements on the underlying algebra that allow us to capture a broad 
range of data structure invariants and graph properties and reason locally about them in a 
suitable separation algebra. Building on this theory we develop a general proof technique 
for modular reasoning about global graph properties that can be integrated with existing 
separation logics ($3). We further identify general mathematical conditions that can be 
used when desired to guarantee unique flows, and provide local proof arguments to check 
the preservation of these conditions ($4). We demonstrate the versatility of our approach 
by presenting local proofs for two challenging examples: the PIP and the concurrent 
non-blocking list due to Harris 12). 


Flows Redesigned. Our work is inspired by the recent flow framework explored by 
some of the authors (22), but was redesigned from the ground up. We revisit the core 
algebra behind flow reasoning, and derive a different algebraic foundation by analysing 
the minimal requirements for general local reasoning; we call our newly-designed 
reasoning framework the foundational flow framework. Our new framework makes 
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several significant improvements over |22| and eliminates its most stark limitations. We 
provide a detailed technical comparison with [22] and discuss other related work in 


2 The Foundational Flow Framework 


In this section, we introduce the foundational flow framework, explaining the motivation 
for its design with respect to local reasoning principles. We aim for a general technique 
for modularly proving the preservation of recursively-defined invariants over (partial) 
graphs, with well-defined decomposition and composition operations. 


2.1 Preliminaries and Notation 


The term (b ? tı : t2) denotes t; if condition b holds and tz otherwise. We write f: A > 
B for a function from A to B, and f: A — B fora partial function from A to B. Fora 
partial function f, we write f(x) = L if f is undefined at x. We use lambda notation 
(Ax. E) to denote a function that maps «x to the expression E (typically containing x). If 
f is a function from A to B, we write f[x — y] to denote the function from A U {x} 
defined by f|x => y|(z) = (z = x ? y : f(z)). We use {£1 => y1,..-,2n — Yn} for 
pairwise different x; to denote the function eļx > yi]--- [£n — Yn], where € is the 
function on an empty domain. Given functions fı: A, —> Band fg: Ap > B we write 
fi © fe for the function f: A; W A2 > B that maps x € A; to fı(x) and z € Aə to 
fo(x) Gf Ai and Ag are not disjoint sets, fı © f2 is undefined). 

We write ôn=n': M — M for the function defined by ôn=w (m) = mif n =n’ 
else 0. We also write Ao := (Am. 0) for the identically zero function, Aig == (Am. m) 
for the identity function, and use e = e’ to denote function equality. For e: M — M and 
m E€ M we write mb e to denote the function application e(m). We write eo e’ to denote 
function composition, i.e. (e o e’)(m) = e(e’(m)) for m € M, and use superscript 
notation e?” to denote the function composition of e with itself p times. 

For multisets S, we use standard set notation when clear from the context. We write 
S(x) to denote the number of occurrences of x in S. We write {x1 > i1,...,%n — in} 
for the multiset containing 2; occurrences of x1, 22 occurrences of £2, etc. 

A partial monoid is a set M, along with a partial binary operation +: M x 
M — M, and a special zero element 0 € M, such that (1) + is associative, i.e., 
(mı +m2)+m3 = mı + (M2 + ms); and (2) 0 is an identity, i.e., m+0=0+m = m. 
Here, = means either both sides are defined and equal, or both are undefined. We 
identify a partial monoid with its support set M. If + is a total function, then we call 
M a monoid. Let m1,m2,m3 E M be arbitrary elements of the (partial) monoid in 
the following. We call a (partial) monoid M commutative if + is commutative, i.e., 
mı + m2 = mg + mı. Similarly, a commutative monoid M is cancellative if + is 
cancellative, 1.e., if mı + m2 = Mı + Ms is defined, then mz = m3. 

A separation algebra |5| is a cancellative, partial, commutative monoid. 


2.2 Flows 


Recursive properties of graphs naturally depend on non-local information; e.g. we cannot 
express that a graph is acyclic directly as a conjunction of per-node invariants. Our 
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foundational flow framework defines flow values at each node that capture non-local 
graph properties, and enables local specification and reasoning about such properties. 
Flow values are drawn from a flow domain, an algebraic structure which also specifies 
the operations used to define a flow via recursive computations over the graph. Our 
entire theory is parametric with the choice of a flow domain, whose components will be 
explained and motivated in the rest of this section. 


Definition 1 (Flow Domain). A flow domain (M, +, 0, E) consists of a commutative 
cancellative (total) monoid (M,+,0) and a set of edge functions E C M —> M. 


Example 1. The path-counting flow domain is (N,+,0, {Aia, Ao}), consisting of the 
monoid of natural numbers under addition and the set of edge functions containing only 
the identity function and the zero function. This can be used to define a flow where the 
values at each node represent the number of paths to this node from a distinguished node 
n. Path-counting provides enough information to express locally per node that e.g. (a) 
all nodes are reachable from n (all path counts are non-zero), or (b) that the graph forms 
a tree rooted at n (all path counts are exactly 1). 


Example 2. We use (NX,U,0, {Ao} U {(Am. {max(m U {p})}) | peN} as flow do- 
main for the PIP example (Figure[I). This consists of the monoid of multisets of natural 
numbers under multiset union and two kinds of edge functions: Ao and functions map- 
ping a multiset m to the singleton multiset containing the maximum value between m 
and a fixed value p (used to represent a node’s default priority). This can define a flow 
which locally captures the appropriate current node priorities as the graph is modified. 


Further definitions in this section assume a fixed flow domain (M, +, 0, E) anda 
(potentially infinite) set of nodes St. For this section, we abstract heaps using directed 
partial graphs; integration of our graph reasoning with direct proofs over program heaps 
is explained in 


Definition 2 (Graph). A (partial) graph G = (N,e) consists of a finite set of nodes 
N C N and a mapping from pairs of nodes to edge functions e: N x N —> E. 


Flow Values and Flows. Flow values (taken from M; the first element of a flow domain) 
are used to capture sufficient information to express desired non-local properties of a 
graph. In Example[I| flow values are non-negative integers; for the PIP (Example 2) 
we instead use multisets of integers, representing relevant non-local information: the 
priorities of nodes currently referencing a given node in the graph. Given such flow values, 
a node’s correct priority can be defined locally per node in the graph. This definition 
requires only the maximum value of these multisets, but as we will see shortly these 
multisets enable local recomputation of a correct priority when the graph is changed. 

For a graph G = (N, e) we express properties of G in terms of node-local conditions 
that may depend on the nodes’ flow. A flow is a function fl: N — M assigning every 
node a flow value and must be some fixpoint of the following flow equation: 


Yn E N. fi(n) y+ 5 fil(n') o eln’, n) (FlowEqn) 
n'EN 
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Intuitively, one can think of the flow as being obtained by a fold computation over the 
graphf"]the inflow in: N —> M defines an initial flow at each node. This initial flow 
is then updated recursively for each node n: the current flow value at its predecessor 
nodes n’ is transferred to n via edge functions e(n',n): M — M. These flow values are 
aggregated using the summation operation + of the flow domain to obtain an updated 
flow of n; a flow for the graph is some fixpoint satisfying this equation at all nodes. | 


Definition 3 (Flow Graph). A flow graph H = (N, e, fl) is a graph (N, e) and function 
fl: N > M such that there exists an inflow in: N — M satisfying FlowEqn(in, e, fl). 


We let dom(H) = N, and sometimes identify H and dom(/) to ease notational 
burden. For n € H we write H, for the singleton flow subgraph of H induced by n. 


Edge Functions. In any flow graph, the flow value assigned to a node n by a flow 
is propagated to its neighbours n’ (and transitively) according to the edge function 
e(n, n’) labelling the edge (n, n’). The edge function maps the flow value at the source 
node n to one propagated on this edge to the target node n’. Note that we require such 
a labelling for all pairs consisting of a source node n inside the graph and a target 
node n’ € MN (ie., possibly outside the graph). The 0 flow value (the third element 
of our flow domains) is used to represent no flow; the corresponding (constant) zero 
function ào = (Am. 0) is used as edge function to model the absence of an edge in the 
graph. A set of edge functions Æ from which this labelling is chosen can, other than 
the requirement Ay € E, be chosen as desired. As we will see in restrictions to 
particular sets of edge functions FE can be exploited to further strengthen our overall 
technique. Edge functions can depend on the local state of the source node (as in the 
following example); dependencies from elsewhere in the graph must be represented by 
the node’s flow. 


Example 3. Consider the graph in Figure[I]and the flow domain as in Example[2| We 
choose the edge functions to be Ao where no edge exists in the PIP structure, and other- 
wise (Am. {max(m U {d})}) where d is the default priority of the source of the edge. 
For example, in Figure [1] e(r3,p2) = Ao and e(r3,p1) = (Am. {max(m U {0})}). 
Since the flow value at r3 is {1, 2,2}, the edge (r3, pı) propagates the value {2} to pı, 
correctly representing the current priority of r3. 


Flow Aggregation and Inflows. The flow value at a node is defined by those propagated 
to it from each node in a graph via edge functions, along with an additional inflow value 
explained here. Since multiple non-zero flow values can be propagated to a node, we 
require an aggregation of these values via a binary + operator on flow values : the second 
element of our flow domains. The edges from which the aggregated values originate 
are unordered. Thus, we require + to be commutative and associative, making this 
aggregation order-independent. The 0 flow value must act as a unit for +. For example, 
in the path-counting flow domain + means addition on natural numbers, while for the 
multisets employed for the PIP it means multiset union. 


4 We note that flows are not generally defined in this manner as we consider any fixpoint of the 
flow equation to be a flow. Nonetheless, the analogy helps to build an initial intuition. 
5 We discuss questions regarding the existence and uniqueness of such fixpoints in 
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Each node in a flow graph has an inflow, modelling contributions to its flow value 
which do not come from inside the graph. Inflows play two important roles: first, since 
our graphs are partial, they model contributions from nodes outside of the graph. Second, 
inflow can be artificially added as a means of specialising the computation of flow values 
to characterise specific graph properties. For example, in the path-counting domain, we 
give an inflow of 1 to the node from which we are counting paths, and 0 to all others. 


Example 4. Let the edges in the graph in Figure [be labelled as described in Example[] 
If the inflow function in assigns the empty multiset to every node n and we let fi(n) be 
the multiset labelling every node in the figure, then FlowEqn(in, e, fl) holds. 


The flow equation defines the flow of a node n to be the aggregation of 
flow values coming from other nodes n’ inside the graph (as given by the respective edge 
function e(n’, n)) as well as the inflow in(n). Preserving solutions to this equation across 
updates to the graph structure is a fundamental goal of our technique. The following 
lemma (which relies on the fact that + is required to be cancellative) states that any 
correct flow values uniquely determine appropriate inflow values: 


Lemma 1. Given a flow graph (N,e, fl), there exists a unique inflow in such that 
FlowEgqn(in, e, fl). 


We now turn to how solutions of the flow equation can be preserved or appropriately 
updated under changes to the underlying graph. 


Graph Updates and Cancellativity. Given a flow graph with known flow and inflow 
values, suppose we remove an edge from nı to no (replacing the edge function with 
Xo). For the same inflow, such an update will potentially affect the flow at nz and nodes 
to which ng (transitively) propagates flow. Starting from the simple case that nz has 
no outgoing edges, we need to recompute a suitable flow at n2. Knowing the old flow 
value (say, m) and the contribution m’ = fl(n1) > e(m1, n2) previously provided along 
the removed edge, we know that the correct new flow value is some m” such that 
m’ +m" = m. This constraint has a unique solution (and thus, we can unambiguously 
recompute a new flow value) exactly when the aggregation + is cancellative; we therefore 
make cancellativity a requirement on the + of any flow domain. 

Cancellativity intuitively enforces that the flow domain carries enough information 
to enable adaptation to local updates (in particular, removal of edgedp. Returning to the 
PIP example, cancellativity requires us to carry multisets as flow values rather than only 
the maximum priority value: + cannot be the maximum operation, as this would not be 
cancellative. The resulting multisets (like the prio fields in the actual code) provide the 
information necessary to recompute corrected priority values locally. 

For example, in the PIP graph shown in Figure [I] removing the edge from pe to 
r4would not affect the current priority of r4 whereas if p7 had current priority 1 instead 
of 2, then the current priority of r4 would have to decrease. In either case, recomputing 
the flow value for r4 is simply a matter of subtraction (removing {2} from the multiset at 
r4); cancellativity guarantees that our flow domains will always provide the information 


é As we will show in 42.3] an analogous problem for composition of flow graphs is also directly 
solved by this choice to force aggregation to be cancellative. 
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needed for this recomputation. Without this property, the recomputation of a flow value 
for the target node n would, in general, entail recomputing the incoming flow values 
from all remaining edges from scratch. Cancellativity is also crucial for Lemma[I]above, 
forcing uniqueness of inflows, given known flow values in a flow graph. This allows us 
to define natural but powerful notions of flow graph decomposition and recomposition. 


2.3 Flow Graph Composition and Abstraction 


Building towards the core of our reasoning technique, we now turn to the question 
of decomposition and recomposition of flow graphs. Two flow graphs with disjoint 
domains always compose to a graph, but this will be a flow graph only if their flows are 
chosen consistently to admit a solution to the resulting flow equation (i.e. the flow graph 
composition operator © defined below is partial). 


Definition 4 (Flow Graph Algebra). The flow graph algebra (FG, ©, Hg) for the flow 
domain (M, +, 0, E) is defined by 


FG = {(N,e, fl) | (N, e, fl) is a flow graph}, Ho = (0,€0, flo), 


N, WN: W W if in FG 
(M1, €1, fli) © (No, e2, fla) = ( anny e2, fl 9 fla) vie x 
J; otherwise, 


where eg and flg are the edge functions and flow on the empty set of nodes N = 9). 


Intuitively, two flow graphs compose to a flow graph if their contributions to each 
others’ flow (along edges from one to the other) are reflected in the corresponding inflow 
of the other graph. For example, consider the subgraph from Figure [I] consisting of 
the single node py (with 0 inflow). This will compose with the remainder of the graph 
depicted only if this remainder subgraph has an inflow which, at node r4, includes at 
least the multiset {2}, reflecting the propagated value from p7. 

We use this intuition to extract an abstraction of flow graphs which we call flow 
interfaces. Given a flow (sub)graph, its flow interface consists of the node-wise inflow 
and outflow (the flow contributions its nodes make to all nodes outside of the graph, 
defined below). It is thus an abstraction that hides the flow values and edges that are 
wholly inside the flow graph. Flow graphs that have the same flow interface “look the 
same” to the external graph, as the same values are propagated inwards and outwards. 


Definition 5 (Flow Interface). For a given flow domain M, a flow interface is a pair 
I = (in, out) where in: N + M and out: N\ N > M for some N CN. 


We write I.in, I.out for the two components of the interface I = (in, out). We will 
again sometimes identify J and dom(J.in) to ease notational burden. 

Given a flow graph H € FG, we can compute its interface as follows. Recall that 
Lemma{I|implies that any flow graph has a unique inflow. Thus, we can define an inflow 
function that maps each flow graph H = (N, e, fl) to the unique inflow inf(H): H > 
M such that FlowEqn(inf (H), e, fl). Dually, we define the outflow of H as the function 
outf(H): N \ N > M defined by outf(H)(n) := X en A(n) > e(n’, n). The flow 
interface of H, written int( H), is the pair (inf(H), outf(H)) consisting of its inflow 
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and its outflow. Returning to the previous example, if H is the singleton subgraph 
consisting of node p7 from Figure[1] with flow and edges as depicted, then int(H) = 
(An. 0, An. (n=r4 ? {2}: 0)). 

This abstraction, while simple, turns out to be powerful enough to build a separation 
algebra over our flow graphs, allowing them to be decomposed, locally modified and 
recomposed in ways yielding all the local reasoning benefits of separation logics. In 
particular, for graph operations within a subgraph with a certain interface, we need to 
prove: (a) that the modified subgraph is still a flow graph (by checking that the flow 
equation still has a solution locally in the subgraph) and (b) that it satisfies the same 
interface (in other words, the effect of the modification on the flow is contained within 
the subgraph); the meta-level results for our technique then justify that we can recompose 
the modified subgraph with any graph that the original could be composed with. 

We define the corresponding flow interface algebra as follows: 


Definition 6 (Flow Interface Algebra). For a given flow domain M, the flow interface 
algebra over M is defined to be (F1, ®, Ig), where: 


FI := {I | Lis a flow interface} , Ig = int(H), 

I LAk=0 
AVi Æj € {1,2} ne. L.in(n) = L.in(n) + Ij.out(n) 
AVn € I. T.out(n) = L .out(n) + Ip.0ut(n) 


L otherwise. 


lh Blo = 


Flow interface composition is well-defined because of cancellativity of the underlying 
flow domain (it is also, exactly as flow graph composition, partial). We next show the 
key result for this abstraction: the ability for two flow graphs to compose depends only 
on their interfaces; flow interfaces implicitly define a congruence relation on flow graphs. 


Lemma 2. int( Hı) =A int( H2) =h int( Hı © Hə) = l @ Í. 


Crucially, the following result shows that we can use our flow interfaces as an 
abstraction directly compatible with existing separation logics. 


Theorem 1. The flow interface algebra (FI, ®, Ig) is a separation algebra. 


This result forms the core of our reasoning technique; it enables us to make modifi- 
cations within a chosen subgraph and, by proving preservation of its interface, know that 
the result composes with any context exactly as the original did. Flow interfaces cap- 
ture precisely the information relevant about a flow graph, with respect to composition 
with other flow graphs. In Appendix B of the accompanying technical report (hereafter, 
TR) we provide additional examples of flow domains that demonstrate the range of 
data structures and graph properties that can be expressed using flows, including a notion 
of universal flow that in a sense provides a completeness result for the expressivity of 
the framework. We now turn to constructing proofs atop these new reasoning principles. 


318 S. Krishna et al. 


3 Proof Technique 


This section shows how to integrate flow reasoning into a standard separation logic, 
using the priority inheritance protocol (PIP) algorithm to illustrate our proof techniques. 

Since flow graphs and flow interfaces form separation algebras, it is possible in 
principle to define a separation logic (SL) using these notions as a custom semantic 
model (indeed, this is the proof approach taken in (22}). By contrast, we integrate flow 
interfaces with a standard separation logic without modifying its semantics. This has 
the important technical advantage that our proof technique can be naturally integrated 
with existing separation logics and verification tools supporting SL-style reasoning. We 
consider a standard sequential SL in this section, but our technique can also be directly 
integrated with a concurrent SL such as RGSep (as we show in or frameworks such 
as Iris supporting (ghost) resources ranging over user-defined separation algebras. 


3.1 Encoding Flow-based Proofs in SL 


Proofs using our flow framework can employ a combination of specifications enforced 
at the node level and in terms of the flow graphs and interfaces corresponding to larger 
heap regions such as entire data structures (henceforth, composite graphs and composite 
interfaces). At the node level, we write invariants that every node is intended to satisfy, 
typically relating the node’s flow value to its local state (fields). For example, in the PIP, 
we use node-local invariants to express that a node’s current priority is the maximum of 
the node’s default priority and those in its current flow value. We typically express such 
specifications in terms of singleton (flow) graphs, and their singleton interfaces. 

Specification in terms of composite interfaces has several important purposes. One 
is to define custom inflows: e.g. in the path-counting flow domain, specifying that the 
inflow of a composite interface is 1 at some designated node r and 0 elsewhere enforces 
in any underlying flow graph that each node n’s flow value will be the number of paths 
from r to nf] Composite interfaces can also be used to express that, in two states of 
execution, a portion of the heap “looks the same” with respect to composition (it has the 
same interface, and so can be composed with the same flow graphs), or to capture by 
how much there is an observable difference in inflow or outflow; we employ this idea in 
the PIP proof below. 

We now define an assertion syntax convenient for capturing both node-level and 
composite-level constraints, defined within an SL-style proof system. We assume an intu- 
itionistic, garbage-collected SL (6) with standard syntax and semantics{*|see Appendix A 
of the TR for more details. 


Node Predicates. The basic building block of our flow-based specifications is a node 
predicate N(x, H), representing ownership of the fields of a single node z, as well as 


7 Note that the analogous property cannot be captured at the node level; when considering 
singleton interfaces per node in a tree rooted at r, every singleton interface has an inflow of 1. 

8 As Px d = PA ¢ for pure formulas P in garbage-collected SLs, we use * instead of A 
throughout this paper. 
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capturing its corresponding singleton flow graph H: 


N(x, H) = Afs, fl. x > fs x H = ({x}, (Ay. edge(x, fs, y)), A) * y(x, fs, A(x) 


N is implicitly parameterised by fs, edge and y; these are explained next and are typically 
fixed across any given flow-based proof. The N predicate expresses that we have a heap 
cell at location x containing fields fs (a list of field-name/value mappings) It also 
says that H is a singleton flow graph with domain {x} with some flow fl, whose edge 
functions are defined by a user-defined abstraction function edge(z, fs, y); this function 
allows us to define edges in terms of x’s field values. Finally, the node, its fields, and 
its flow in this flow graph satisfy the custom predicate y, used to encode node-local 
properties such as constraints in terms of the flow values of nodes. 


Graph Predicates. The analogous predicate for composite graphs is Gr. It carries owner- 
ship to the nodes making up a potentially unbounded graph, using iterated separating 
conjunction over a set of nodes X as mentioned in 


Gr(X, H) := JH. k N(x, H(x)) * H = O H(z) 


rex cTEX 


Gr is also implicitly parameterised by fs, edge and y. The existentially-quantified H is 
a logical variable representing a function from nodes in X to corresponding singleton 
flow graphs. Gr(X, H) describes a set of nodes X, such that each x € X is an N (in 
particular, it satisfies y), whose singleton flow graphs compose back to H. As well as 
carrying ownership of the underlying heap locations, Gr’s definition allows us to connect 
a node-level view of the region X (each H(2)) with a composite-level view defined by 
H, on which we can impose appropriate graph-level properties such as constraints on 
the region’s inflow. 


Lifting to Interfaces. Flow based proofs can often be expressed more elegantly and 
abstractly using predicates in terms of node and composite-level interfaces rather than 
flow graphs. To this end, we overload both our node and graph predicates with analogues 
whose second parameter is a flow interface, defined as follows: 


Næ, I) = 
Gr(X,I) = 


H. N(x, H) * I = int(H) 
H. Gr(x, H) x I = int( H) 


We will use these versions in the PIP proof below; interfaces capture all relevant proper- 
ties for decomposition and composition of these flow graphs. 


Flow Lemmas. We first illustrate our N and Gr predicates (which capture SL ownership 
of heap regions and abstract these with flow interfaces) by identifying a number of 
lemmas which are generically useful in flow-based proofs. Reasoning at the level of flow 
interfaces is entirely in the pure world (mathematics independent of heap-ownership and 


° For simplicity, we assume that all fields of a flow graph node are to be handled by our flow- 
based technique, and that their ownership (via +> points-to predicates) is always carried around 
together; lifting these restrictions would be straightforward. 
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Gr(X1W X2, H) j 31, H2. Gr(X1, H1) * Gr(X2, H2) 
x Hı © Ho = H (DECOMP) 
Gr(X1, Hi) * Gr(X2, He) * Hi © He #L j= Gr(X1W Xo, Hi © He) (Comp) 
N(x, H) = Gr({x},H) (SING) 
emp |= Gr(0, Ho) (GREMP) 
Gr(X1, Hi) x Gr(X2, H2) * H = Hi@ He |H Gr(X1W X2, Hi © He) (REPL) 


x int(H1) = int( H1) x int(H) = int(H; © H2) 


Fig. 2: Some useful lemmas for proving entailments between flow-based specifications. 


resources) with respect to the underlying SL reasoning; these lemmas are consequences 
of our predicate definitions and the foundational flow framework definitions themselves. 

Examples of these lemmas are shown in Figure [2| shows that we can 
always decompose a valid flow graph into subgraphs which are themselves flow graphs. 
Recomposition (COMP) is possible only if the subgraphs compose. These rules, as well 
as {SING}, and (GREMP) follow directly from the definition of Gr and standard SL prop- 
erties of iterated separating conjunction. The final rule is a direct consequence of 
rules (Comp), ey and the congruence relation on flow graphs induced by their 
interfaces (cf. Lemma|2). Conceptually, it expresses that after decomposing any flow 
graph into two parts H; and H2, we can replace H, with a new flow graph H; with the 
same interface; when recomposing, the overall graph will be a flow graph with the same 
overall interface. 

Note the connection between rules (COMP)/(DECOMP) and the algebraic laws of 
standard inductive predicates such as Is describing a segment of a linked list [2]. For 
instance by combining the definition of Gr with these rules and we can prove the 
following graph analogue of the rule to separate a list into the head node and the tail: 


Gr(X © {y} , H) =IH,, H'.N(y, H,) * Gr(X, H’) + H = Hy, © H’ ((UN)FOLD) 


However, crucially (and unlike when using general inductive predicates B2). this rule 
is symmetrical for any node x in X; it works analogously for any desired order of 
decomposition of the graph, and for any data structure specified using flows. 

When working with our overloaded N and Gr predicates, similar steps to those 
described by the above lemmas are useful. Given these overloaded predicates, we simply 
apply the lemmas above to the existentially quantified flow-graphs in their definitions and 
then lift the consequence of the lemma back to the interface level using the congruence 
between our flow graph and interface composition notions (Lemma[2). 


3.2 Proof of the PIP 


We now have all the tools necessary to verify the priority inheritance protocol (PIP). 
Figure] gives the full algorithm with flow-based specifications; we also include some 
intermediate assertions to illustrate the reasoning steps for the acquire method, which 
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1// Let d(m,qi,g2) = M \ (q > 0? {q}: 0)U (q2 > 0? {a2} : 0) 
3method update(n: Ref, from: Int, to: Int) 

4 requires N(n,In)*Gr(X \ {n}, T )* I=] *p(I)*nEex 
s requires l, = ({n — ô(In.in(n), from, to)} ,In.out) * from £ to 
6 ensures Gr(X,I) 


s n.prios := n.prios \ {from} 

ə if (to >= 0) { 

10 n.prios := n.prios U {to} 

11 } 

2 from := n.curr_prio 

3 n.curr_prio := max(n.prios U {n.def_prio}) 
4 to := n.curr_prio 

15 

w if (from != to && n.next != null) { 

17 update (n.next, from, to) 


20 

2 method acquire (p: Ref, r: Ref) 

2 requires Gr(X,I)»xọy(I)xpEX*rEX*pÆ#r 
23 ensures Gr(X,I) 


3 {aIr Ip, Li. N(r, Ir) * N(p, Ip) * Gr(X \ {r,p} hh) *I = Ir @ lp Oh * v(I)} 
2 if (r.next == null) { 

2 r.next := p 

28 // Let q@ = r.curr_prio 


JI», I}, Ip, I1. N(r, Ih) x N(p, Ip) * Gr(X \ {r,p}, hh) *l=1,0h Oh 
+ I) = (I,.in, {p > {ar}}) * Ir.out = Ao * + 
M en N(p, Ip) * Gr(X \ {p},I2)*I =I, 8 "| 


* I, = ({p > 6(Ip-in(p), —1, dr)} , Ip- out) * ++- 


31 update (p; -1, r.curr_prio) 

32 {Gr(X, I)} 

3 } else { 

34 p-next := r; update(r, -1, p.curr_prio) 


33 method release(p: Ref, r: Ref) 

399 requires Gr(X,/J)*p(U)*pEXx*xreEeX«xpFr 

4 ensures Gr(X, /) 

a { r.next := null; update(p, r.curr_prio, -1) } 


Fig. 3: Full PIP code and specifications, with proof sketch for acquire. The comments 
and coloured annotations (lines[29]to[32) are used to highlight steps in the proof, and are 
explained in detail in the text. 
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we explain in more detail below. []We instantiate our framework in order to capture the 
PIP invariants as follows: 


fs := {next : Y; Curr_prio: q, def_prio: q, prios: Q} 


edge(x, fs, z) = T max(m U {q°})) ifz = y # null 
Xo otherwise 
(a, fs,m) = > 0 * (Va EQ. 20) * m=Q * q= {max(QU {q"})} 


(T) =e (ào, Ao) 


Each node has the four fields listed in fs. fs also defines variables such as y to denote 
field values that are used in the definitions of edge and 7; these variables are bound to the 
heap by N. edge abstracts the heap into a flow graph by letting each node have an edge 
to its next successor labelled by a function that passes to it the maximum incoming 
priority or the node’s default priority: whichever is larger. With this definition, one can 
see that the flow of every node will be the multiset containing exactly the priorities of 
its predecessors. The node-local invariant y says that all priorities are non-negative, the 
flow m of each node is stored in the prios field, and its current priority is the maximum 
of its default and incoming priorities. Finally, the constraint y on the global interface 
expresses that the graph is closed — it has no inflow or outflow. 


Flows Specifications for the PIP. Our specifications of acquire and release guarantee 
that if we start with a valid flow graph (closed, according to y), we are guaranteed to 
return a valid flow graph with the same interface (i.e. the graph remains closed). For 
clarity of the exposition, we focus here on how we prove that being a flow graph that 
satisfies the PIP invariant is preserved (as is the composite flow graph’s interface). 
Extending this specification to one which proves, e.g., that acquire adds the expected 
edge is straightforward (see Appendix C of the TR Bpm 

The specification for update is somewhat subtle, and exploits the full flexibility 
of flow interfaces as a specification medium. The preconditions of update describe an 
update to the graph which is not yet completed. There are three complementary aspects 
to this specification. Firstly, (as for acquire and release), node-local invariants (y) 
hold for all nodes in the graph (enforced via N and Gr predicates). Secondly, we employ 
flow interfaces to express a decomposition of the original top-level interface J into 
compatible (primed) sub-interfaces. The key to understanding this specification is that 
I’, is in some sense a fake interface; it does not abstract the current state of the heap node 
n. Instead, I’, expresses the way in which the node n’s current inflow hasn’t yet been 
accounted for in the heap: that if n could adjust its inflow according to the propagated 
priority change without changing its outflow, then it would compose back with the rest of 
the graph, and restore the graph’s overall interface. The shorthand 6 defines the required 
change to n’s inflow. 

In general (except when n’s next field is null, or n’s flow value is unchanged), it 
is not even possible for n’s fields to be updated to satisfy I}; by updating n’s inflow, 


10 Tn specifications, we implicitly quantify at the top level over free variables such as J. Ao denotes 
an identically zero function on an unconstrained domain. 
1! We also omit acquire’s precondition that p.next == null for brevity. 
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we will necessarily update its outflow. However, we can then construct a corresponding 
“fake” interface for the next node in the graph, reflecting the update yet to be accounted 
for, and establishing the precondition for the recursive call to update. 

The third specification aspect is the connection between heap-level nodes and in- 
terfaces. The N(n, I„) predicate connects n with a different interface; I, is the actual 
current abstraction of n’s state. Conceptually, the key property which is broken at this 
point is this connection between the interface-level specification and the heap at node n, 
reflected by the decomposition in the specification between X \ {n} and {n}. 

We note that the same specification ideas and proof style can be easily adapted to 
other data structure implementations with an update-notify style, including well-known 
designs such as Subject-Observer patterns, or the Composite pattern (27). 


Proof Outline. To illustrate the application of flows reasoning to our PIP specification 
ideas more clearly, we examine in detail the first i £-branch in the proof of acquire. Our 
intermediate proof steps are shown as purple annotations surrounded by braces. The first 
step, as shown in the first line inside the method body, is to apply twice (on 
the flow graphs represented by these predicates) and peel off N predicates for each of r 
and p. The update to r’s next field (line[27) causes the correct singleton interface of r to 
change to J}: its outflow (previously none, since the next field was null) now propagates 
flow to p. We summarise this state in the assertion on line 29] (we omit e.g. repetition 
of properties from the function’s precondition, focusing on the flow-related steps of 
the argument). We now rewrite this state; using the definition of interface composition 
(Definition|6) we deduce that although J} and 7, do not compose (since the former has 
outflow that the latter does not account for as inflow), the alternative “fake” interface 
I for p (which artificially accounts for the missing inflow) would do so (cf. line 30). 
Essentially, we show J, $ I, = I, $ I a that the interface of {r, p} would be unchanged 
if p could somehow have interface Iy: Now by setting Iz = I’. ® I and using algebraic 
properties of interfaces, we assemble the precondition expected by update. After the 
call, update’s postcondition gives us the desired postcondition. 

We focused here on the details of acquire’s proof, but very similar manipulations 
are required for reasoning about the recursive call in update’s implementation|?|The 
main difference there is that if the if-condition wrapping the recursive call is false then 
either the last-modified node has no successor (and so there is no outstanding inflow 
change needed), or we have from = to which implies that the “fake” interface is actually 
the same as the currently correct one. 

Despite the property proved for the PIP example being a rather delicate recursive in- 
variant over the (potentially cyclic) graph, the power of our framework enables extremely 
succinct specifications for the example, and proofs which require the application of rela- 
tively few generic lemmas. The integration with standard separation logic reasoning, and 
the complementary separation algebras provided by flow interfaces allow decomposition 
and recomposition to be simple proof steps. For this proof, we integrated with standard 
sequential separation logic, but in the next section we will show that compatibility with 
concurrent SL techniques is similarly straightforward. 


12 We provide further proof outlines in Appendix C of the TR [23]. 
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Fig. 4: A potential state of the Harris list with explicit memory management. fnext 
pointers are shown with dashed edges, marked nodes are shaded gray, and null pointers 
are omitted for clarity. 


4 Advanced Flow Reasoning and the Harris List 


This section introduces some advanced foundational flow framework theory and demon- 
strates its use in the proof of the Harris list. We note that presented a proof of this 
data structure in the original flow framework. The proof given here shows that the new 
framework eliminates the need for the customized concurrent separation logic defined 
in (22). We start with a recap of Harris’ algorithm adapted from (22). 


4.1 The Harris List Algorithm 


The power of flow-based reasoning is exhibited in the proof of overlaid data structures 
such as the Harris list, a concurrent non-blocking linked list algorithm (12). This algo- 
rithm implements a set data structure as a sorted list, and uses atomic compare-and-swap 
(CAS) operations to allow a high degree of parallelism. As with the sequential linked 
list, Harris’ algorithm inserts a new key k into the list by finding nodes k1, k2 such that 
kı < k < kg, setting k to point to kz, and using a CAS to change k; to point to k only 
if it was still pointing to k2. However, a similar approach fails for the delete operation. 
If we had consecutive nodes k1, k2, k3 and we wanted to delete k2 from the list (say by 
setting kı to point to k3), there is no way to ensure with one CAS that k2 and kg are also 
still adjacent (another thread could have inserted/deleted in between them). 

Harris’ solution is a two step deletion: first atomically mark kə as deleted (by setting 
a mark bit on its successor field) and then later remove it from the list using a single 
CAS. After a node is marked, no thread can insert or delete to its right, hence a thread 
that wanted to insert k’ to the right of ky would first remove ka from the list and then 
insert k’ as the successor of kı. 

In a non-garbage-collected environment, unlinked nodes cannot be immediately freed 
as suspended threads might continue to hold a reference to them. A common solution 
is to maintain a second “free list” to which marked nodes are added before they are 
unlinked from the main list (this is the so-called drain technique). These nodes are then 
labelled with a timestamp, which is used by a maintenance thread to free them when it is 
safe to do so. This leads to the kind of data structure shown in Figure [4] where each node 
has two pointer fields: a next field for the main list and an fnext field for the free list 
(the list from fh to ft via dashed edges). Threads that have been suspended while holding 
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Fig. 5: Examples of graphs that motivate effective acyclicity. All graphs use the path- 
counting flow domain, the flow is displayed inside each node, and the inflow is displayed 
as curved arrows to the top-left of nodes. shows a graph and inflow that has no 
solution to (FlowEgn); [(b)]has many mS shows a modification that preserves 
the interface of the modified nodes, yet goes from a graph that has a unique flow to one 


that has many solutions to (FlowEqn). 


a reference to a node that was added to the free list can simply continue traversing the 
next pointers to find their way back to the unmarked nodes of the main list. 

Even for seemingly simple properties such as that the Harris list is memory safe and 
not leaking memory, the proof will rely on the following non-trivial invariants: 


(a) The data structure consists of two (potentially overlapping) lists: a list on next 
edges beginning at mh and one on fnext edges beginning at fh. 

(b) The two lists are null terminated and next edges from nodes in the free list point to 
nodes in the free list or main list. 

(c) All nodes in the free list are marked. 

(d) ft is an element in the free list (due to concurrency, it’s not always the tail). 


Challenges. To prove that Harris’ algorithm maintains the invariants listed above we 
must tackle a number of challenges. First, we must construct flow domains that allow us 
to describe overlaid data structures, such as the overlapping main and free lists ($4.2). 
Second, the flow-based proofs we have seen so far work by showing that the interface of 
some modified region is unchanged. However, if we consider a program that allocates 
and inserts a new node into a data structure (like the insert method of Harris), then the 
interface cannot be the same since the domain has changed (it has increased by the 
newly allocated node). We must thus have a means to reason about preservation of flows 
by modifications that allocate new nodes ($4.3). The third issue is that in some flow 
domains, there exist graphs GŒ and inflows in for which no solutions to the flow equation 
(FlowEgn) exist. For instance, consider the path-counting flow domain and the graph 
in Figure Since we would need to use the path-counting flow in the proof of the 
Harris list to encode its structural invariants, this presents a challenge ($4.4). 

We will next see how to overcome these three challenges in turn, and then apply 
those solution to the proof of the Harris list in 
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4.2 Product Flows for Reasoning about Overlays 


An important fact about flows is that any flow of a graph over a product of two flow 
domains is the product of the flows on each flow domain component. 


Lemma 3. Given two flow domains (Mj, +1, 01, £1) and (M2, +2, 02, E2), the product 
domain (Mı x Mz,+, (01,02), E) is a flow domain, where + and E are the pointwise 
liftings of (+1, +2) and (E1, E2), respectively, to the domain Mı x Mə. 


This lemma greatly simplifies reasoning about overlaid graph structures; we will use 
the product of two path-counting flows to describe a structure consisting of two overlaid 
lists that make up the Harris list. 


4.3 Contextual Extensions and the Replacement Theorem 


In general, when modifying a flow graph H to another flow graph H’, requiring that H’ 
satisfies precisely the same interface int(H) can be too strong a condition as it does not 
permit allocating new nodes. Instead, we want to allow int( H’) to differ from int( H) 
in that the new interface could have a larger domain, as long as the edges from the new 
nodes do not change the outflow of the modified region. 


Definition 7. An interface I = (in, out) is contextually extended by I’ = (in’, out’), 
written I X I’, if and only if the following conditions all hold: 


(1) dom(in) C dom(in’), 
(2) Yn € dom(in). in(n) = in' (n), and 
(3) Yn! g dom(in’). out(n’) = out! (n’). 
The following theorem states that contextual extension preserves composability and 
is itself preserved under interface composition. 


Theorem 2 (Replacement Theorem). Jf I = h ® In, and I, X Ij, are all valid 
interfaces such that I, A Ig = and Yn € I \ L. Iz.out(n) = 0, then there exists a 
valid I' = I, ® Iz such that I xX T. 


In terms of our flow predicates, this theorem gives rise to the following adaptation of 


the (REPL) rule: 
Gr(X{, H1) * Gr(X2, H2) * H = Hy © Ho « int( H1) X int( H4) 


= JH’. Gr( X} © Xə, H’)* H' = H! © Ho x int(H) Xint(H’)  (REPL+) 


The rule is derived from the Replacement Theorem by instantiating with 
I = int(H), h = int(H1), I2 = int(H2) and I; = int(H{). We know I x% Hi; 
H = H, © Heo tells us (by Lemmaf2) that 7 = J © Ig, and Gr(X}, H1) * Gr( X2, H2) 
gives us 7; N Ig = Ø. The final condition of the Replacement Theorem is to prove that 
there is no outflow from Xə to any newly allocated node in X{. While we can use 
additional ghost state to prove such constraints in our proofs, if we assume that the 
memory allocator only allocates fresh addresses and restrict the abstraction function 
edge to only propagate flow along an edge (n, n’) if n has a (non-ghost) field with a 
reference to n’ then this condition is always true. For simplicity, and to keep the focus of 
this paper on the flow reasoning, we make this assumption in the Harris list proof. 
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4.4 Existence and Uniqueness of Flows 


We typically express global properties of a graph GŒ = (N, e) by fixing a global inflow 
in: N —> M and then constraining the flow of each node in N using node-local 
conditions. However, as we discussed at the beginning of this section, there is no general 
guarantee that a flow exists or is unique for a given in and G. The remainder of this 
section presents two complementary conditions under which we can prove that our flow 
fixpoint equation always has a unique solution. To this end, we say that a flow domain 
(M, +, 0, E) has unique flows if for every graph (N, e) over this flow domain and inflow 
in: N — M, there exists a unique f that satisfies the flow equation FlowEqn(in, e, fl). 
But first, we briefly recall some more monoid theory. 

We say M is positive if mı + mz = 0 implies that mı = mz = 0. For a positive 
monoid M, we can define a partial order < on its elements as mı < mz if and only if 
dm3.m, + M3 = Mg. This definition implies that every m € M satisfies 0 < m. 

For e, e’: M — M, we write e + e’ for the function that maps m € M to e(m) + 
e'(m). We lift this construction to a set of functions E and write it as } se p €- 


Definition 8. A function e: M — M is called an endomorphism on M if for every 
mı, Mə E€ M, e(mı + mz) = e(m 1) + e(m2). We denote the set of all endomorphisms 
on M by End(M). 


Note that for cancellative M, e(0) = 0 for every endomorphism e € End(M). 
Note further that e + e’ € End(M) for any e, e’ € End(M). Similarly, for finite sets 
E C End(M), $ ccp € € End(M). We say that a set of endomorphisms Æ C End(M) 
is closed if for every e,e’ € E,eoe’ € Eande+ e €E. 


Nilpotent Cycles. Let (M,+,0, E) be a flow domain where every edge function e € Æ 
is an endomorphism on M. In this case, we can show that the flow of a node n is the 
sum of the flow as computed along each path in the graph that ends at n. Suppose we 
additionally know that the edge functions are defined such that their composition along 
any cycle in the graph eventually becomes the identically zero function. We then need 
only consider finitely many paths to compute the flow of a node, which means the flow 
equation has a unique solution. 


Definition 9. A closed set of endomorphisms E C End(M) is called nilpotent if there 
exists p > 1 such that e” = 0 for every e € E. 


Example 5. The flow domain (N?, +, (0,0), {(A(z, y). (0,c-x)) | c € N}) contains 
nilpotent edge functions that shift the first component of the flow to the second (with 
a scaling factor). This domain can be used to express the property that every node in a 
graph is reachable from the root via a single edge (by requiring the flow of every node to 
be (0, 1) under the inflow (An. (n = r ? (1,0) : (0,0)))). 


Before we prove that nilpotent endomorphisms lead to unique flows, we present a 
useful notion when dealing with endomorphic flow domains. 


Definition 10. The capacity of a flow graph G = (N, e) is cap(G): N x N —> (M > 
M), defined inductively as cap(G) := capl! (G), where cap°(G)(n,n’) := dn=n and 
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For a flow graph H = (N,e, fl), we write cap(H)(n,n’) = cap((N,e))(n,n’) 
for the capacity of the underlying graph. Intuitively, cap(G)(n, n’) is the function that 
summarizes how flow is routed from any source node n in G to any other node n’, 
including those outside of G. 

We can now show that if all edges of a flow graph are labelled with edges from a 
nilpotent set of endomorphisms, then the flow equation has a unique solution: 


Lemma 4. If (M,+,0, E) is a flow domain such that M is a positive monoid and E is 
a nilpotent set of endomorphisms, then this flow domain has unique flows. 


Effectively Acyclic Flow Graphs. There are some flow domains that compute flows 
useful in practice, but which do not guarantee either existence or uniqueness of fixpoints 
a priori for all graphs. For example, the path-counting flow from Example|||is one where 
for certain graphs, there exist no solutions to the flow equation (see aa and for 
others, there can exist more than one (in Figure[5(b)| the nodes marked with x can have 
any path count, as long as they both have the same value). 

In such cases, we explore how to restrict the class of graphs we use in our flow-based 
proofs such that each graph has a unique fixpoint; the difficulty is that this restriction must 
be respected for composition of our graphs. Here, we study the class of flow domains 
(M,+,0, E) such that M is a positive monoid and E is a set of reduced endomorphisms 
(defined below). In such domains we can decompose the flow computations into the 
various paths in the graph, and achieve unique fixpoints by restricting the kinds of cycles 
graphs can have. 


Definition 11. A flow graph H = (N, e, fl) is effectively acyclic (EA) if for every 1 < k 
and ni,... nk E N, 


filni) > e(ni, n2) +++ elnk-1,Nnk) > elnk, nı) = 0. 


The simplest example of an effectively acyclic graph is one where the edges with 
non-zero edge functions form an acyclic graph. However, our semantic condition is 
weaker: for example, when reasoning about two overlaid acyclic lists whose union 
happens to form a cycle, a product of two path-counting domains will satisfy effective 
acyclicity because the composition of different types of edges results in the zero function. 


Lemma 5. Let (M, +,0, E) be a flow domain such that M is a positive monoid and 
E is a closed set of endomorphisms. Given a graph (N, e) over this flow domain and 
inflow in: N + M, if there exists a flow graph H = (N, e, fl) that is effectively acyclic, 
then fl is unique. 


While the restriction to effectively acyclic flow graphs guarantees us that the flow is 
the unique fixpoint of the flow equation, it is not easy to show that modifications to the 
graph preserve EA while reasoning locally. Even modifying a subgraph to another with 
the same flow interface (which we know guarantees that it will compose with any context) 
can inadvertently create a cycle in the larger composite graph. For instance, consider 
Figure[5(0)] that shows a modification to nodes {n3, n4} (the boxed blue region). The 
interface of this region is ({n3 => 1,n4 — 1}, {ns — 1,n2 >> 1}), and so swapping 


Local Reasoning for Global Graph Properties 329 


the edges of n3 and n4 preserves this interface. However, the resulting graph, despite 
composing with the context to form a valid flow graph, is not EA (in this case, it has 
multiple solutions to the flow equation). This shows that flow interfaces are not powerful 
enough to preserve effective acyclicity. For a special class of endomorphisms, we show 
that a local property of the modified subgraph can be checked, which implies that the 
modified composite graph continues to be EA. 


Definition 12. A closed set of endomorphisms E C End(M) is called reduced if e o e = 
Xo implies e = Xo for every e € E. 


Note that if E is reduced, then no e € E can be nilpotent. In that sense, this class of 
instantiations is complementary to the nilpotent class. 


Example 6. Examples of flow domains that fall into this class include positive semirings 
of reduced rings (with the additive monoid of the semiring being the aggregation monoid 
of the flow domain and F being any set of functions that multiply their argument with 
a constant flow value). Note that any direct product of integral rings is a reduced ring. 
Hence, products of the path counting flow domain are a special case. 


For reduced endomorphisms, it suffices to check that a modification preserves the 
flow routed between every pair of source and sink node in order to ensure that it does 
not create any new cycles in any composite graph. 


Definition 13. A flow graph H’ is a subflow-preserving extension of H, for which we 
write H x, H', if the following conditions all hold: 


(1) int(H) X int(H’) 
(2) Vn € H, n' ¢ H',m. m < inf(H)(n) > mpcap(A)(n,n’) = mpcap(H’)(n, n’) 
(3) Vn € H'\ Hn’ g H',m. m < inf(H’)(n) > mp cap(H’)(n,n') = 0 


This pairwise check, apart from requiring the interface of the modified region to be 
unchanged, also permits allocating new nodes as long as no flow is routed via the new 
nodes (condition[@)p. We now show that it is sufficient to check that a modification is a 
subflow-preserving extension to guarantee composition back to an effectively-acyclic 
composite graph: 


Theorem 3. Let (M/,+,0, E) be a flow domain such that M is a positive monoid and E 
is a reduced set of endomorphisms. If H = Hı © Hə and Hı x, H are all effectively 
acyclic flow graphs such that H| O Hə = and Yn € H; \ Hı. outf(H2)(n) = 0, then 
there exists an effectively acyclic flow graph H' = H; © Hə such that H x, H'. 


We define effectively acyclic versions of our flow graph predicates, Na (x, H) and 
Gra(X, H), that additionally constrain H to be effectively acyclic. The above theorem 
yields the following variant of the rule for EA graphs: 


Gra(X}, Hi) x Gra (X2, H2) xH = Hı © Hə x Hı “a H; 
= SH". Gra(X) © X2, H') * H' = H! © Hə * H X, H' (REPLEA) 
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4.5 Proof of the Harris List 


We use the techniques seen in this section in the proof of the Harris list. As the data 
structure consists of two potentially overlapping lists, we use Lemma fB]to construct a 
product flow domain of two path-counting flows: one tracks the path count from the 
head of the main list, and one from the head of the free list. We also work under the 
effectively acyclic restriction (i.e. we use the Na and Gr, predicates), both in order to 
obtain the desired interpretation of the flow as well as to ensure existence of flows in this 
flow domain. 
We instantiate the framework using the following definitions of parameters: 


fs = {key: k,next: y, fnext: z} 
edge(z, fs,v) = (v = null? `o : (v =y Ay Ez? AG 0) 
(vu AyAYyY=2? Aor): (VH=YAY =z? Aa: Ad)))) 
(a, fs, T) = (I.in(x) € {(1, 0), (0,1), (1, 1)}) * (L.in(x) 4 (1,0) > M(y)) 
x (x = ft > I.in(x) = (_,1)) * (GM (y) > z = null) 
p) = I = (Aolmh => (1,0)][fh — (0, 1)], Ao) 


Here, edge encodes the edge functions needed to compute the product of two path 
counting flows, the first component tracks path-counts from mh on next edges and the 
second tracks path-counts from fh on fnext edges|'>| The node-local invariant y says: 
the flow is one of {(1, 0), (0, 1), (1, 1)} (meaning that the node is on one of the two lists, 
invariant|(a)); if the flow is not (1,0) (the node is not only on the main list, i.e. it is 
on the free list) then the node is marked (indicated by M (y), invarianto}; and if the 
node is ft then it must be on the free list Gnvariant {D}. The constraint on the global 
interface, ọ, says that the inflow picks out mh and fh as the roots of the lists, and there 
is no outgoing flow (thus, all non-null edges must stay within the graph, invariant|(b)). 

Since the Harris list is a concurrent algortihm, we perform the proof in rely-guarantee 
separation logic (RGSep) (41). Like in we do not need to modify the semantics of 
RGSep in any way; our flow-based predicates can be defined and reasoning using our 
lemmas can be performed in the logic out-of-the-box. For space reasons, the full proof 
can be found in Appendix D of the TR (23). 


5 Related Work 


As mentioned in §1| the most closely related work is the flow framework developed by 
some of the authors in (22). We here present a simplified and generalized meta theory of 
flows that makes the approach much more broadly applicable. There were a number of 
limitations of the prior framework that prevented its application to more general classes 
of examples. 

First, required flow domains to form a semiring; the analogue of edge functions 
are restricted to multiplication with a constant which must come from the same flow 


13 We use the shorthands (1,9) == (A(m1, m2). (m1, 0)) and A(o,1) = (A(m1, m2). (0, m2)), 
and denote an anonymous existentially-quantified variable by _. 
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value set. This restriction made it complex to encode many graph properties of interest. 
For example, one could not easily encode the PIP flow, or a simple flow that counts the 
number of incoming edges to each node. Our foundational flow framework decouples 
the algebraic structure defining how flow is aggregated from the algebraic structure of 
the edge functions. In this way, we obtain a more general framework that applies to many 
more examples, and with simpler flow domains. 


Second, in flow graph did not uniquely determine its inflow (cf. LemmafI). 
Correspondingly, [22]’s notion of interface included an equivalence class of inflows (all 
those that induce the same flow values). Since, in (22), the interface also determines 
which modifications are permitted by the framework, could only handle modifica- 
tions that preserve the inflow equivalence class. For example, this prevents one from 
reasoning locally about the removal of a single edge from a graph in certain cases (in 
particular, like release does in the PIP). Our foundational flow framework solves 
this problem by requiring that the aggregation operation on flow values is cancellative, 
guaranteeing unique inflows. 


Cancellativity is fundamentally incompatible with (22), which requires the flow 
domain to form an w-CPO in order to guarantee the existence of unique flows. For 
example, in a graph with two nodes n and n’ with identity edges between them and 
all other edges zero (in [22]. edges labelled with 1 and 0), if we have in(n) = 0 
and in(n) = m for some non-zero m, a solution to the flow equation must satisfy 
fl(n) =m 4 fl(n). forces such solutions to exist, ruling out cancellativity. To solve 
this problem, we present a new theory which can optionally guarantee unique flows 
when desired and show that requiring cancellativity does not limit expressivity. 


Next, the proofs of programs shown in depend on a bespoke program logic. This 
logic requires new reasoning primitives that are not supported by the logics implemented 
in existing SL-based verification tools. Our general proof technique eliminates the need 
for a dedicated program logic and can be implemented on top of standard separation log- 
ics and existing SL-based tools. Finally, the underlying separation algebra of the original 
framework makes it hard to use equational reasoning, which is a critical prerequisite for 
enabling proof automation. 


An abundance of SL variants provide complementary mechanisms for modular 
reasoning about programs (e.g. (18/36/38). Most are parameterized by the underlying 
separation algebra; our flow-based reasoning technique easily integrates with these 
existing logics. 

The most common approach to reason about irregular graph structures in SL is to 
use iterated separating conjunction and describe the graph as a set of nodes each 
of which satisfies some local invariant. This approach has the advantage of being able to 
naturally describe general graphs. However, it is hard to express non-local properties that 
involve some form of fixpoint computation over the graph structure. One approach is to 
abstract the program state as a mathematical graph using iterated separating conjunction 
and then express non-local invariants in terms of the abstract graph rather than the 
underlying program state (14)/35|[38}. However, a proof that a modification to the state 
maintains a global invariant of the abstract graph must then often revert back to non-local 
and manual reasoning, involving complex inductive arguments about paths, transitive 
closure, and so on. Our technique also exploits iterated separating conjunction for the 
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underlying heap ownership, with the key benefit that flow interfaces exactly capture the 
necessary conditions on a modified subgraph in order to compose with any context and 
preserve desired non-local invariants. 

In recent work, Wang et al. present a Coq-mechanised proof of graph algorithms in 
C, based on a substantial library of graph-related lemmas, both for mathematical and 
heap-based graphs [42]. They prove rich functional properties, integrated with the VST 
tool. In contrast to our work, a substantial suite of lemmas and background properties are 
necessary, since these specialise to particular properties such as reachability. We believe 
that our foundational flow framework could be used to simplify framing lemmas in a 
way which remains parameteric with the property in question. 

Proofs of a number of graph algorithms have been mechanized in various verification 
tools and proof assistants, including Tarjan’s SCC algorithm [8], union-find m. Kruskal’s 
minimum spanning tree algorithm (13), and network flow algorithms (25). These proofs 
generally involve non-local reasoning arguments about mathematical graphs. 

An alternative approach to using SL-style reasoning is to commit to global reasoning 
but remain within decidable logics to enable automation (16\[21)/24)[28)/43}. However, 
such logics are restricted to certain classes of graphs and certain types of properties. 
For instance, reasoning about reachability in unbounded graphs with two successors 
per node is undecidable (15). Recent work by Ter-Gabrielyan et al. shows how 
to deal with modular framing of pairwise reachability specifications in an imperative 
setting. Their framing notion has parallels to our notion of interface composition, but 
allows subgraphs to change the paths visible to their context. The work is specific to 
a reachability relation, and cannot express the rich variety of custom graph properties 
available in our technique. 

Dynamic frames (e.g. implemented in Dafny Bap, can be used to explicitly 
reason about framing of heap information in a first-order logic. However, by itself, this 
theory does not enable modular reasoning about global graph properties. We believe that 
the flow framework could in principle be adapted to the dynamic frames setting. 


6 Conclusions and Future Work 


We have presented the foundational flow framework, enabling local modular reasoning 
about recursively-defined properties over general graphs. The core reasoning technique 
has been designed to make minimal mathematical requirements, providing great flexi- 
bility in terms of potential instantiations and applications. We identified key classes of 
these instantiations for which we can provide existence and uniqueness guarantees for 
the fixpoint properties our technique addresses and demonstrate our proof technique on 
several challenging examples. As future work, we plan to automate flow-based proofs 
in our new framework using existing tools that support SL-style reasoning such as 


Viper and GRASShopper [34]. 
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Abstract. Building network-connected programs and distributed sys- 
tems is a powerful way to provide scalability and availability in a digital, 
always-connected era. However, with great power comes great complexity. 
Reasoning about distributed systems is well-known to be difficult. 

In this paper we present Aneris, a novel framework based on separation 
logic supporting modular, node-local reasoning about concurrent and 
distributed systems. The logic is higher-order, concurrent, with higher- 
order store and network sockets, and is fully mechanized in the Coq proof 
assistant. We use our framework to verify an implementation of a load 
balancer that uses multi-threading to distribute load amongst multiple 
servers and an implementation of the two-phase-commit protocol with 
a replicated logging service as a client. The two examples certify that 
Aneris is well-suited for both horizontal and vertical modular reasoning. 


Keywords: Distributed systems - Separation logic - Higher-order logic - 
Concurrency - Formal verification 


1 Introduction 


Reasoning about distributed systems is notoriously difficult due to their sheer 
complexity. This is largely the reason why previous work has traditionally focused 
on verification of protocols of core network components. In particular, in the 
context of model checking, where safety and liveness assertions [29] are consid- 
ered, tools such as SPIN [9], TLA+ [23], and Mace [17] have been developed. 
More recently, significant contributions have been made in the field of formal 
proofs of implementations of challenging protocols, such as two-phase-commit, 
lease-based key-value stores, Paxos, and Raft [7, 25, 30, 35, 40]. All of these 
developments define domain specific languages (DSLs) specialized for distributed 
systems verification. Protocols and modules proven correct can be compiled to 
an executable, often relying on some trusted code-base. 

Formal reasoning about distributed systems has often been carried out by 
giving an abstract model in the form of a state transition system or flow-chart in 
the tradition of Floyd [5], Lamport [21, 22]. A state is normally taken to be a 
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view of the global state and events are observable changes to this state. State 
transition systems are quite versatile and have been used in other verification 
applications. However, reasoning based on state transition systems often suffer 
from a lack of modularity due to their very global. As a consequence, separate 
nodes or components cannot be verified in isolation and the system has to be 
verified as a whole. 

IronF leet [7] is the first system that supports node-local reasoning for verifying 
the implementation of programs that run on different nodes. In IronFleet, a 
distributed system is modeled by a transition system. This transition system 
is shown to be refined by the composition of a number of transition systems, 
each pertaining to one of the nodes in the system. Each node in the distributed 
system is shown to be correct and a refinement of its corresponding transition 
system. Nevertheless, IronFleet does not allow you to reason compositionally; a 
correctness proof for a distributed system cannot be used to show the correctness 
of a larger system. 

Higher-order concurrent separation logics (CSLs) [3, 4, 13, 15, 18, 26, 27, 
28, 33, 34, 36, 39] simplify reasoning about higher-order imperative concurrent 
programs by offering facilities for specifying and proving correctness of programs in 
a modular way. Indeed, their support for modular reasoning (a.k.a. compositional 
reasoning) is the key reason for their success. Disel [35] is a separation logic 
that does support compositional reasoning about distributed systems, allowing 
correctness proofs of distributed systems to be used for verifying larger systems. 
However, Disel struggles with node-local reasoning in that it cannot hide node- 
local usage of mutable state. That is, the use of internal state in nodes must be 
exposed in the high-level protocol of the system and changes to the internal state 
are only possible upon sending and receiving messages over the network. 

Finally, both Disel and IronF leet restrict nodes to run only sequential programs 
and no node-level concurrency is supported. 

In this paper we present Aneris, a framework for implementing and reasoning 
about functional correctness of distributed systems. Aneris is based on concurrent 
separation logic and supports modular reasoning with respect to both nodes 
(node-local reasoning) and threads within nodes (thread-local reasoning). The 
Aneris framework consists of a programming language, AnerisLang, for writing 
realistic, real-world distributed systems and a higher-order concurrent separation 
logic for reasoning about these systems. AnerisLang is a concurrent ML-like 
programming language with higher-order functions, local state, threads, and 
network primitives. The operational semantics of the language, naturally, involves 
multiple hosts (each with their own heap and multiple threads) running in a 
network. The Aneris logic is build on top of the Iris framework [13, 15, 18] 
and supports machine-verified formal proofs in the Coq proof assistant about 
distributed systems written in AnerisLang. 


Networking. There are several ways of adding network primitives to a program- 
ming language. One approach is message-passing using first-class communication 
channels á la the 7-calculus or using an implementation of the actor model as 
done in high-level languages like Erlang, Elixir, Go, and Scala. However, any 
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such implementation is an abstraction built on top of network sockets where all 
data has to be serialized, data packets may be dropped, and packet reception 
may not follow the transmission order. Network sockets are a quintessential 
part of building efficient, real-world distributed systems and all major operating 
systems provide an application programming interface (API) to them. Likewise, 
AnerisLang provides support for datagram-like sockets by directly exposing a 
simple API with the core methods necessary for socket-based communication 
using the User Datagram Protocol (UDP) with duplicate protection. This allows 
for a wide range of real-world systems and protocols to be implemented (and 
verified) using the Aneris framework. 


Modular Reasoning in Aneris. In general, there are two different ways to support 
modular reasoning about distributed systems corresponding to how components 
can be composed. Aneris enables simultaneously both: 


— Vertical composition: when reasoning about programs within each node, one 
is able to compose proofs of different components to prove correctness of the 
whole program. For instance, the specification of a verified data structure, 
e.g. a concurrent queue, should suffice for verifying programs written against 
that data structure, independently of its implementation. 

— Horizontal composition: at each node, a verified thread is composable with 
other verified threads. Similarly, a verified node is composable with other 
verified nodes which potentially engage in different protocols. This naturally 
aids implementing and verifying large-scale distributed systems. 


Node-local variants of the standard rules of CSLs like, for example, the bind rule 
and the frame rule (as explained in Sect. 2) enable vertical reasoning. Sect. 6 
showcases vertical reasoning in Aneris using a replicated distributed logging 
service that is implemented and verified using a separate implementation and 
specification of the two-phase commit protocol. 

Horizontal reasoning in Aneris is achieved through the THREAD-PAR-rule and 
the Nops-par-rule (further explained in Sect. 2) which intuitively says that to 
verify a distributed system, it suffices to verify each thread and each node in 
isolation. This is analogous to how CSLs allow us to reason about multi-threaded 
programs by considering individual threads in isolation; in Aneris we extend 
this methodology to include both threads and nodes. Where most variants of 
concurrent separation logic use some form of an invariant mechanism to reason 
about shared-memory concurrency, we abstract the communication between nodes 
over the network through socket protocols that restrict what can be sent and 
received on a socket and allow us to share ownership of logical resources among 
nodes. Sect. 5 showcases horizontal reasoning in Aneris using an implementation 
and a correctness proof for a simple addition service that uses a load balancer to 
distribute the workload among several addition servers. Each node is verified in 
isolation and composed to form the final distributed system. 


Contributions. In summary, we make the following contributions: 
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— We present AnerisLang, a formalized higher-order functional programming 
language for writing distributed systems. The language features higher-order 
store, node-local concurrency, and network sockets, allowing for dynamic cre- 
ation and binding of sockets to addresses with serialization and deserialization 
primitives for encoding and parsing messages. 

— We define the Aneris logic, the first higher-order concurrent separation logic 
with support for network sockets and with support for both node-local and 
thread-local reasoning. 

— We introduce a simple and novel approach to specifying network protocols; 
a mechanism that supports separation-logic-style modular specifications of 
distributed systems. 

— We conduct two case studies that showcase how our framework aids the 
implementation and verification of real-world distributed systems using com- 
positional reasoning: 

e A replicated logging service that is implemented and verified using a sep- 
arate implementation and specification of the two-phase commit protocol, 
demonstrating vertical compositional reasoning. 

e A load balancer that distributes work on multiple servers by means of 
node-local multi-threading. We use this to verify a simple addition service 
that uses the load balancer to distribute its requests over multiple servers, 


demonstrating horizontal compositional reasoning. 
— We have formalized all of the theory and examples on top of Iris in the Coq 


proof assistant using the MoSeL framework [19]. The Coq formalization can 
be found online at https://iris-project.org/artifacts /2020-esop-aneris.tar.gz. 


Outline. We start by describing the core concepts of the Aneris framework in 
Sec. 2. We then describe the AnerisLang programming language (Sec. 3) before 
presenting the Aneris logic proof rules and stating our adequacy theorem, i.e., 
soundness of Aneris, in Sec. 4. Subsequently, we use the logic to verify a load 
balancer (Sec. 5) and a two-phase-commit implementation with a replicated 
logging client (Sec. 6). We discuss related work in Sec. 7 and conclude in Sec. 8. 


2 The Core Concepts of Aneris 


In this section we present our methodology to modular verification of distributed 
systems. We begin by recalling the ideas of thread-local reasoning and protocols 
from concurrent separation logic and explain how we lift those ideas to node- 
local reasoning. Finally, we illustrate the Aneris methodology for specifying, 
implementing, and verifying distributed systems by developing a simple addition 
service and a lock server. The distributed systems are composed of individually 
verified concurrently running nodes communicating asynchronously by exchanging 
messages that can be reordered or dropped. 


2.1 Local and Thread-Local Reasoning 


The most important feature of (concurrent) separation logic is, arguably, how 
it enables scalable modular reasoning about pointer-manipulating programs. 
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Separation logic is a resource logic, in the sense that propositions denote not only 
facts about the state, but ownership of resources. Originally, separation logic [32] 
was introduced for modular reasoning about the heap—i.e. the notion of resource 
was fixed to be logical pieces of the heap. The essential idea is that we can give a 
local specification {P}e{v.Q} to a program e involving only the footprint of e. 
Hence, while verifying e, we need not consider the possibility that another piece 
of code in the program might interfere with e; the program e can be verified 
without concern for the environment in which e may occur. Local specifications 
can then be lifted to more global specifications by framing and binding: 


{P}e{v.Q} {Phe{v.Q}  Vu.{Q} Klo] {wR} 
{P x Rhe{v.Q « R} {P} K[e] {w.R} 


where K denotes an evaluation context. The symbol * denotes separating con- 
junction. Intuitively, P x Q holds for a given resource (in this case a heap) if 
it can be divided into two disjoint resources such that P holds for one and Q 
holds for the other. Thus, the frame rule essentially says that executing e for 
which we know {P}e{xz.Q} cannot possibly affect parts of the heap that are 
separate from its footprint. Another related separation logic connective is =x, the 
separating implication. Proposition P = Q describes a resource that, combined 
with a disjoint resource satisfying P, results in a resource satisfying Q. 

Since its introduction, separation logic has been extended to resources be- 
yond heaps and with more sophisticated mechanisms for modular control of 
interference. Concurrent separation logics (CSLs) [28] allow reasoning about 
concurrent programs and a preeminent feature of these program logics is again 
the support for modular reasoning, in this case with respect to concurrency 
through thread-local reasoning. When reasoning about a concurrent program we 
consider threads one at a time and need not reason about interleavings of threads 
explicitly. In a way, our frame here includes, in addition to the shared fragments 
of the heap and other resources, the execution of other threads which can be 
interleaved throughout the execution of the thread being verified. This can be 
seen from the following disjoint concurrency rule: 


‘THREAD-PAR 
{Pi} (n3e1) {v.-Qi} {P2} (n; e2) {v.Q2} 
{P, x Po} (n;e1 || e2) {v.3v1, v2.v = (v1, v2) * Qi [v/v] * Q2[v2/v]} 


where e; || e2 denotes parallel composition of expressions e and ez and we use 
the notation (n;e) to denote an expression e running on a node with identifier 
n.1 

Inevitably, at some point threads typically have to communicate with one 
another through some kind of shared state, an unavoidable form of interference. 
The original CSL used a simple form of resource invariant in which ownership of 


a shared resource can be transferred between threads. 


1 In a language with fork-based concurrency, the parallel composition operator is an 
easily defined construct and the rule is derivable from a more general fork-rule. 
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A notable program logic in the family of concurrent separation logics is Iris 
that is specifically designed for reasoning about programs written in concurrent 
higher-order imperative programming languages. Iris has already proven to be 
versatile for reasoning about a number of sophisticated properties of programming 
languages [12, 16, 37]. In order to support modular reasoning about concurrent 
programs Iris features (1) impredicative invariants for expressing protocols on 
shared state among multiple threads and (2) allows for encoding of higher-order 
ghost state using a form of partial commutative monoids for reasoning about 
resources. We will give examples of these features and explain them in more 
detail as needed. 


2.2 Node-Local Reasoning 


Programs written in AnerisLang are higher-order imperative concurrent programs 
that run on multiple nodes in a distributed system. When reasoning about 
distributed systems in Aneris, alongside heap-local and thread-local reasoning, 
we also reason node-locally. When proving correctness of AnerisLang programs 
we reason about each node of the system in isolation, akin to how we in CSLs 
reason about each thread in isolation. 

By virtue of building on Iris, reasoning in Aneris is naturally modular with 
respect to separation logic frames and with respect to threads. What Aneris 
adds on top of this is support for node-local reasoning about programs. This is 
expressed by the following rule: 


NODE-PAR 
{P, x lsNode(n1) * FreePorts(ip, ,%8)} (m1; e1) {True} 
{P> x IsNode(nz) * FreePorts(ips,%8)} (n2; e2) {True} 


{Pi * Pa * Freelp(ip:) * Freelp(ip2)} (S; (n1; 1; 1) ||| (22; ip2; e2)) {True} 


where ||| denotes parallel composition of two nodes with identifier nı and no 
running expressions e; and ep with IP addresses ip, and ips.? The set P = 
{p | 0 < p < 65535} denotes a finite set of ports. 

Note that only a distinguished system node G can start new nodes (as 
elaborated on in Sect. 3). In Aneris, the execution of the distributed system 
starts with the execution of G as the only node in the system. In order to start 
a new node associated with ip address ip one provides the resource Freelp(ip) 
which indicates that ip is not used by other nodes. The node can then rely 
on the fact that when it starts, all ports on ip are available. The resource 
IsNode(n) indicates that the node n is a node in the system and keeps track of 
abstract state related to our modeling of node n’s heap and allocated sockets. 
To facilitate modular reasoning, free ports can be split: if AM B = Q then 
FreePorts(ip, A) x» FreePorts(ip, B) A- FreePorts(ip, A U B) where A- denotes 


2 In the same way as the parallel composition rule is derived from a more general 
fork-based rule, this composition rule is also an instance of a more general rule for 
spawning nodes shown in Sect. 3. 
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logical equivalence of Aneris propositions (of type iProp). We will use FreePort(a) 
as shorthand for FreePorts(ip, {p}) where a = (ip, p). 

Finally, observe that the node-local postconditions are simply True, in contrast 
to the arbitrary thread-local postconditions in the THREAD-pAR-rule that carry 
over to the main thread. In the concurrent setting, shared memory provides 
reliable communication and synchronization between the child threads and the 
main thread; in the rule for parallel composition, the main thread will wait for 
the two child processes to finish. In the distributed setting, there are no such 
guarantees and nodes are separate entities that cannot synchronize with the 
distinguished system node. 


Socket Protocols. Similar to how classical CSLs introduce the concept of resource 
invariants for expressing protocols on shared state among multiple threads, we 
introduce the simple and novel concept of socket protocols for expressing protocols 
among multiple nodes. With each socket address—a pair of an IP address and 
a port—a protocol is associated, which restricts what can be communicated on 
that socket. 

A socket protocol is a predicate ® : Message — iProp on incoming messages 
received on a particular socket. One can think of this as a form of rely-guarantee 
reasoning since the socket protocol will be used to restrict the distributed en- 
vironment’s interference with a node on a particular socket. In Aneris we write 
a ® to mean that socket address a is governed by the protocol ®. In particular, 
if a > @ and a > W then @ and W are equivalent.* Moreover, the proposition is 
duplicable: a œ> @+ ae bear ©. 

Conceptually, a socket is an abstract representation of a handle for a local 
endpoint of some channel. We further restrict channels to use the User Datagram 
Protocol (UDP) which is asynchronous, connectionless, and stateless. In accor- 
dance with UDP, Aneris provides no guarantee of delivery or ordering although 
we assume duplicate protection. We assume duplicate protection to simplify 
our examples, as otherwise the code of all of our examples would have to be 
adapted to cope with duplication of messages. One can think of sockets in Aneris 
as open-ended multi-party communication channels without synchronization. 

It is noteworthy that inter-process communication can happen in two ways. 
Thread-concurrent programs can communicate both through the shared heap and 
by sending messages through sockets. For memory-separated programs running 
on different nodes all communication is by message-passing. 

In the logic, we consider both static and dynamic socket addresses. This 
distinction is entirely abstract and at the level of the logic. Static addresses come 
with primordial protocols, agreed upon before starting the distributed system, 
whereas dynamic addresses do not. Protocols on static addresses are primarily 
intended for addresses pointing to nodes that offer a service. 

To distinguish between static and dynamic addresses, we use a resource 
Fixed(A) which denotes that the addresses in A are static and should have a fixed 


3 The predicate equivalence is under a later modality in order to avoid self-referential 
paradoxes. We omit it for the sake of presentation as this is an orthogonal issue. 
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interpretation. This proposition expresses knowledge without asserting ownership 
of resources and is duplicable: Fixed(A) J- Fixed(A) » Fixed(A). 

Corresponding to the two kinds of addresses we have two different rules, 
SOCKETBIND-STATIC and SOCKETBIND-DYNAMIC, for binding an address to a socket 
as seen below. Both rules consume an instance of Fixed(A) and FreePort(a) as well 
as a resource z <>, None. The latter keeps track of the address associated with 
the socket handle z on node n and ensures that the socket is bound only once as 
further explained in Sect. 4. Notice that the protocol ® in SockETBIND-DYNAMICG 
can be freely chosen. 


SOCKETBIND-STATIC 
Fixed(A) x a € A x FreePort(a) * z —>n None} 


(n; socketbind z a) 


£. £ = 0 * z >n Some a} 


SOCKETBIND-DYNAMIC 
Fixed(A) x a ¢ A x FreePort(a) * z —>n None} 


(n; socketbind z a) 


z. x =0* z >n Some a * a => D} 


In the remainder of the paper we will use the following shorthands in order to 
simplify the presentation of our specifications. 


Static(a, A, ®) £ Fixed(A) x a € A * FreePort(a) x a œ> ® 
Dynamic(a, A) = Fixed(A) * a ¢ A * FreePort(a) 


2.3 Example: An Addition Service 


To illustrate node-local reasoning, socket protocols, and the Aneris methodology 
for specifying, implementing, and verifying distributed systems we develop a 
simple addition service that offers to add numbers for clients. 

Fig. 1 depicts an implementation of a server and a client written in AnerisLang. 
Notice that the programs look as if they were written in a realistic functional 
language with sockets like OCaml. Messages are strings to make programming 
with sockets easier (similar to send_substring in the Unix module in OCaml). 

The server is parameterized over an address on which it will listen for requests. 
The server allocates a new socket and binds the address to the socket. Then the 
server starts listening for an incoming message on the socket, calling a handler 
function on the message, if any. The handler function will deserialize the message, 
perform the addition, serialize the result, and return it to the sender before 
recursively listening for new messages. 

The client is parameterized over two numbers to compute on, a server address, 
and a client address. The client allocates a new socket, binds the address to the 
socket, and serializes the two numbers. In the end, it sends the serialized message 
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rec server a = rec client x y srva= 
let skt = socket () in let skt = socket () in 
socketbind skt a; socketbind skt a; 
listen skt (rec handler msg from = let m = serialize (x, y) in 
let m = deserialize msg in sendto skt m srv; 
let res = serialize (mı m + m2 m) in let res = listenwait skt in 
sendto skt res from; deserialize (mı res) 


listen skt handler) 


Fig. 1. An implementation of an addition service and a client written in AnerisLang. 
listen and listenwait are convenient helper functions to be found in the appendix [20]. 


to the server address using the socket and waits for a response, projecting out 
the result of the addition on arrival and deserializing it. 

In order to give the server code a specification we will fix a primordial socket 
protocol that will govern the address given to the server. The protocol will spell 
out how the server relies on the socket. We will use from(m) and body(m) for 
projections of the sender and the message body, respectively, from the message 
m. We define Pada as follows: 


Paaa(m) = AW, x,y. from(m) => Y * body(m) = serialize(a, y) * 


Ym, body(m’) = serialize(a + y) ~ W (m') 


Intuitively, the protocol demands that the sender of a message m is governed by 
some protocol ¥ and that the message body body(m) must be the serialization 
of two numbers x and y. Moreover, the sender’s protocol must be satisfied if the 
serialization of x + y is sent as a response. 

Using aaa as the socket protocol, we can give server the specification 


{Static(a, A, Pada) * IsNode(n)} (n; server a) {False}. 


The postcondition is allowed to be False as the program does not terminate. The 
triple guarantees safety which, among others, means that if the server responds 
to communication on address a it does so according to Pada- 

Similarly, using aqa as a primordial protocol for the server address, we can 
also give client a specification 


{srv > Baaq * sru € A x Dynamic(a, A) x IsNode(m)} 
(m; client x y srv a) 
{v.v =x +y} 
that showcases how the client is able to conclude that the response from the 
server is the sum of the numbers it sent to it. In the proof, when binding a to 


the socket using SOCKETBIND-DYNAMIC, we introduce the proposition a & Betient 
where 


Perient(m) = body(m) = serialize(x + y) 


and use it to instantiate W when satisfying Paaa. Using the two specifications 
and the Nopge-par-rule it is straightforward to specify and verify a distributed 
system composed of, e.g., a server and multiple clients. 


Aneris: A Logic for Modular Reasoning about Distributed Systems 345 


2.4 Example: A Lock Server 


Mutual exclusion in distributed systems is often a necessity and there are many 
different approaches for providing it. The simplest solution is a centralized 
algorithm with a single node acting as the coordinator. We will develop this 
example to showcase a more interesting protocol that relies on ownership transfer 
of spatial resources between nodes to ensure correctness. 


The code for a centralized lock server implementation is shown in Fig. 2. 


rec lockserver a = 
let lock = ref NONE in 
let skt = socket () in 
socketbind skt a; 
listen skt (rec handler msg from = 
if (msg = "LOCK") then 
match !lock with 
NONE => lock + SOME (); sendto skt "YES" from 
| SOME _ => sendto skt "NO" from 
end 
else Lock<+ NONE; sendto skt "RELEASED" from 
Listen skt handler) 


Fig. 2. A lock server in AnerisLang. 


The lock server declares a node-local variable lock to keep track of whether 
the lock is taken or not. It allocates a socket, binds the input address to the 
socket and continuously listens for incoming messages. When a "LOCK" message 
arrives and the lock is available, the lock gets taken and the server responds 
"YES". If the lock was already taken, the server will respond "NO". Finally, if 
the message was not "LOCK", the lock is released and the server responds with 
"RELEASED". 

Our specification of the lock server will be inspired by how a lock can 
be specified in concurrent separation logic. Thus we first recall how such a 
specification usually looks like. 

Conceptually, a lock can either be unlocked or locked, as described by a 
two-state labeled transition system. 


K 


In concurrent separation logic, the lock specification does not describe this 
transition system directly, but instead focuses on the resources needed for the 
transitions to take place. In the case of the lock, the resources are simply a 
non-duplicable resource K, which is needed in order to call the lock’s release 
method. Intuitively, this resource corresponds to the key of the lock. 
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A typical concurrent separation logic specification for a spin lock module 
looks roughly like the following: 


JisLock . 

Vu, K. isLock(v, K) 4b isLock(v, K) * isLock(v, K) 
Vu, K. isLock(v, K) + K * K > False 

{True} newLock () {v. 3K. isLock(v, K)} 

Vu. {isLock(v, K)} acquire v {v.K} 

Vu. {isLock(v, K) x K} release v {True} 


a S a 


The intuitive reading of such a specification is: 


— Calling newLock will lead to the duplicable knowledge of the return value v 
being a lock. 

— Knowing that a value is a lock, a thread can try to acquire the lock and when 
it eventually succeeds it will get the key K. 

— Only a thread holding this key is allowed to call release. 


Sharing of the lock among several threads is achieved by the isLock predicate 
being duplicable. Mutual exclusion is ensured by the last bullet point together 
with the requirement of K being non-duplicable whenever we have isLock(v, K). 
For a leisurely introduction to such specifications, the reader may consult Birkedal 
and Bizjak [1]. 

Let us now return to the distributed lock synchronization. To give clients 
the possibility of interacting with the lock server as they would with such a 
concurrent lock module, the specification for the lock server will look like follows. 


{K x Static(a, A, Biock)} (n; Lockserver a) {False}. 


This specification simply states that a lock server should have a primordial 
protocol Plock and that it needs the key resource to begin with. To allow for the 
desired interaction with the server, we define the socket protocol Block as follows: 


acq(m, W) = (body(m) = ” LOCK” ) * 
Ym’. (body(m’) = ”NO”) V (body(m’) = ”YES” x K) = U(m’) 
rel(m,W) = (body(m) = ”RELEASE” ) * K x 
Ym’. (body(m') = ” RELEASED” ) = Y (m') 
Piock(m) £ IV. from(m) > W x (acq(m, Y) V rel(m,V)) 


The protocol Plock demands that a client of the lock has to be bound to some 
protocol W and that the server can receive two types of messages fulfilling either 
acq(m,¥) or rel(m,W). These correspond to the module’s two methods acquire 
and release respectively. In the case of a "LOCK" message, the server will answer 
either "NO" or "YES" along with the key resource. In either case, the answer should 
suffice for fulfilling the client protocol VW. 
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Receiving a ” RELEASE” request is similar, but the important part is that we 
require a client to send the key resource K along with the message, which ensures 
that only the current holder can release the lock. 

One difference between the distributed and the concurrent specification is 
that we allow for the distributed lock to directly deny access. The client can use 
a simple loop, asking for the lock until it is acquired, if it wishes to wait until 
the lock can be acquired. 

There are several interesting observations one can make about the lock server 
example: (1) The lock server can allocate, read, and write node-local references 
but these are hidden in the specification. (2) There are no channel descriptors 
or assertions on the socket in the code. (3) The lock server provides mutual 
exclusion by requiring clients to satisfy a sufficient protocol. 


3 AnerisLang 


AnerisLang is an untyped functional language with higher-order functions, fork- 
based concurrency, higher-order mutable references, and primitives for communi- 
cating over network sockets. The syntax is as follows: 


v E€ Valz=()|b|ils|@|z|recfa=e|... 
e € Erpr ::=v | x | rec fx =e | e1 e2 | refe | l!e | e1 + e2 | cas e1 e2 e3 
| find e1 e2 e3 | substring e1 ez e3 | i2s e | s2i e 
| fork {e} | start {n;ip;e} | makeaddress e1 e2 


| socket e | socketbind e1 e2 | sendto e1 e2 e3 | receivefrom e | ... 


We omit the usual operations on pairs, sums, booleans b € B, and integers 
i € Z which are all standard. We introduce the following syntactic sugar: lambda 
abstractions Ax. e defined as rec x = e, let-bindings let x = e; in e2 defined as 
(Ax. e2)(e1), and sequencing e1; e2 defined as let _ = e1 in e2. 

We have the usual operations on locations £ € Loc in the heap: refv for 
allocating a new reference, !@ for dereferencing, and Z + v for assignment. 
cas l vı v2 is an atomic compare-and-set operation used to achieve synchronization 
between threads on a specific memory location £. Operationally, it tests whether @ 
has value vı and if so, updates the location to v2, returning a boolean indicating 
whether the swap succeeded or not. 

The operation find finds the index of a particular substring in a string s € 
String and substring splits a string at given indices, producing the corresponding 
substring. i2s and s2i convert between integers and strings. These operations 
are mainly used for serialization and deserialization purposes. 

The expression fork {e} forks off a new (node-local) thread and start {n; ip; e} 
will spawn a new node n € Node with ip address ip € Ip running the program e. 
Note that it is only at the bootstrapping phase of a distributed system that a 
special system-node G will be able to spawn nodes. 

We use z € Handle to range over socket handles created by the socket 
operation. makeaddress constructs an address given an ip address and a port, 
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and the network primitives socketbind, sendto, and receivefrom correspond to 
the similar BSD-socket API methods. 


Operational Semantics. We define the operational semantics of AnerisLang in 
three stages. 

We first define a node-local, thread-local, head step reduction (e, h) ~ (e’, h’) 
for e,e’ € Expr and h,h’ € Loc £ Val that handles all pure and heap-related 
node-local reductions. All rules of the relation are standard. 

Next, the node-local head step reduction induces a network-aware head step 
reduction ((n;e), X) + ((n;e’), X"). 


(e,h) ~> (e', h’) 
(n;e), (H[n = h], S, P,M) > (n; e, (H[n = K'],S, P, M) 


Here n € Node denotes a node identifier and X, X’ € NetworkState the global 
network state. Elements of NetworkState are tuples (H, S, P, M) tracking heaps 
H € Node A Heap and sockets S € Node a Handle ®A Option Address for all 


nodes, ports in use P € Ip i g™( Port), and messages sent M € Id aA Message. 
The induced network-aware reduction is furthermore extended with rules for 
the network primitives as seen in Fig. 3. The socket operation allocates a new 


z € dom(S(n)) S’ = S|n = S(n)[z = None]] 
(n; socket ()), (H,S,P,M) > (n; 2), (H,S',P, M) 


S(n)(z) = None 
pÆ P(ip) S= S[n = S(n)[z Some (ip, p)]] P’ = Plip = P(ip) U {p} 
(n; socketbind z (ip, p)}, (H, S, P, M) — (n; 0), (H, S', P’, M) 


S(n)(z) = Some from i ¢ dom( M) M' = Mii (from, to, msg, SENT)] 
(n; sendto z msg to), (H, S, P, M) > (n; |Imsg|), (H, S, P, M’) 


S(n)(z) = Some to 
M (i) = (from, to, msg, SENT) M' = Mii (from, to, msg, RECEIVED)] 
(n; receivefrom z), (H, S, P, M) — (n; Some (msg, from)), (H,S, P, M’) 


S(n)(z) = Some to 
(n; receivefrom z), (H,S, P, M) > (n; None), (H,S, P, M) 


Fig. 3. An excerpt of the rules for network-aware head reduction. 


unbound socket using a fresh handle z for a node n and socketbind binds a 
socket address a to an unbound socket z if the address and port p is not already 
in use. Hereafter, the port is no longer available in P’(ip). For bound sockets, 
sendto sends a message msg to a destination address to from the sender’s address 
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from found in the bound socket. The message is assigned a unique identifier and 
tagged with a status flag SENT indicating that the message has been sent and 
not received. The operation returns the number of characters sent. 

To model possibly dropped or delayed messages we introduce two rules for 
receiving messages using the receivefrom operation that on a bound socket either 
returns a previously unreceived message or nothing. If a message is received the 
status flag of the message is updated to RECEIVED 

Third and finally, using standard call-by-value right-to-left evaluation contexts 
K € Ectzx we lift the node-local head reduction to a distributed systems reduction 
— shown below. We write —»* for its reflexive-transitive closure. The distributed 
systems relation reduces by picking a thread on any node or forking off a new 
thread on a node. 


((n;e), X) > ((ne’), 2") 
(Ti + [(n; Kle])] + T2, 2) > (Ti + [(n; K[e])] + T2; 2") 


(T1 + [(n; K[fork {e}])] ++ T2, X) > (Ti + [(n; KOD] ++ T2 + [(n; e)], X) 


4 The Aneris Logic 


As a consequence of building on the Iris framework, the Aneris logic features all 
the usual connectives and rules of higher-order separation logic, some of which 
are shown in the grammar below.“ The full expressiveness of the logic can be 
exploited when giving specifications to programs or stating protocols. 


P,Q € iProp ::= True | Fale | PAQ| PVQ|P+}>Q | 
Ve.P|ar.P|P*Q|P*Q|t=u| 


Ln v | [P] ia!” | {P} (nse) fa. Q} |... 


Note that in Aneris the usual points-to connective about the heap, £ >n v, is 
indexed by a node identifier n € Node, asserting ownership of the singleton heap 
mapping £ to v on node n. 


The logic features (impredicative) invariants |P| and user-definable ghost state 
Ay 


via the proposition 'a! , which asserts ownership of a piece of ghost state a at 
ghost location y. The logical support for user-defined invariants and ghost state 
allows one to relate (ghost and physical) resources to each other; this is vital for 
our specifications as will become evident in Sect. 5 and Sect. 6. We refer to Jung 
et al. [14] for a more thorough treatment of user-defined ghost state. 

To reason about AnerisLang programs, the logic features Hoare triples.” The 
intuitive reading of the Hoare triple {P} (n; e) {x. Q} is that if the program e on 


4 To avoid the issue of reentrancy, invariants are annotated with a namespace and 
Hoare triples with a mask. We omit both for the sake of presentation as they are 
orthogonal issues. 

5 In both Iris and Aneris the notion of a Hoare triple is defined in terms of a weakest 
precondition but this will not be important for the remainder of this paper. 


350 M. Krogh-Jespersen et al. 


node n is run in a distributed system s satisfying P, then the computation does 
not get stuck and, moreover, if it terminates with a value v and in a system s’, 
then s’ satisfies Q[v/z]. In other words, a Hoare triple implies safety and states 
that all spatial resources that are used by e are contained in the precondition P. 

In contrast to spatial propositions that express ownership, e.g., L On v, 
propositions like |P| and {P} (n;e) {x. Q} express knowledge of properties that, 
once true, hold true forever. We call this class of propositions persistent. Persistent 
propositions P can be freely duplicated: P JF- P x P. 


4.1 The Program Logic 


The Aneris proof rules include the usual rules of concurrent separation logic for 
Hoare triples, allowing formal reasoning about node-local pure computations, 
manipulations of the the heap, and forking of threads. Expressions e are annotated 
with a node identifier n, but the rules are otherwise standard. 

To reason about individual nodes in a distributed system in isolation, Aneris 
introduces the following rule: 


START 


{P x IsNode(n) x» FreePorts(ip, $)} (n; e) {True} 
{P x Freelp(ip)} (G; start {n; ip;e}) {x. x = ()} 


where 8 = {p | 0 < p < 65535}. This rule is the key rule allowing node-local 
reasoning; the rule expresses exactly that to reason about a distributed system it 
suffices to reason about each node in isolation. 

As described in Sect. 3, only the distinguished system node G can start new 
nodes—this is also reflected in the START-rule. In order to start a new node 
associated with IP address ip, the resource Freelp(ip) is provided. This indicates 
that ip is not used by other nodes. When reasoning about the node n, the proof 
can rely on all ports on ip being available. The resource IsNode(n) indicates that 
the node n is a valid node in the system and keeps track of abstract state related 
to the modeling of node n’s heap and sockets. IsNode(n) is persistent and hence 
duplicable. 


Network Communication. To reason about network communication in a dis- 
tributed system, the logic includes a series of rules for reasoning about socket 
manipulation: allocation of sockets, binding of addresses to sockets, sending via 
sockets, and receiving from sockets. 

To allocate a socket it suffices to prove that the node n is valid by providing 
the IsNode(n) resource. In return, an unbound socket resource z —n None is 
given. 


SOCKET 
{IsNode(n)} (n; socket ()) {z.z An None} 


The socket resource z ©, o keeps track of the address associated with the 
socket handle z on node n and takes part in ensuring that the socket is bound 
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only once. It behaves similarly to the points-to connective for the heap, e.g., 
Zn O* ZO, Oo! => False. 

As briefly touched upon in Sect. 2, the logic offers two different rules for 
binding an address to a socket depending on whether or not the address has a (at 
the level of the logic) primordial, agreed upon protocol. To distinguish between 
such static and dynamic addresses, we use a persistent resource Fixed(A) to keep 
track of the set of addresses that have a fixed socket protocol. 

To reason about a static address binding to a socket z it suffices to show that 
the address a being bound has a fixed interpretation (by being in the “fixed” set), 
that the port of the address is free, and that the socket is not bound. 


SOCKETBIND-STATIC 


{Fixed(A) x a € A x FreePort(a) x z =n None} 
(n; socketbind z a) 
{x.x = 0 * z >n Some a} 
In accordance with the BSD-socket API, the bind operation returns the integer 0 
and the socket resource gets updated, reflecting the fact that the binding took 
place. 
The rule for dynamic address binding is similar but the address a should not 


have a fixed interpretation. Moreover, the user of the logic is free to pick the 
socket protocol ® to govern address a. 


SOCKETBIND-DYNAMIC 
{Fixed(A) x a ¢ A x FreePort(a) * z =n None} 


(n; socketbind z a) 


{x. £ =0* z >n Some a * ak D} 


To reason about sending a message on a socket z it suffices to show that z is 
bound, that the destination of the message is governed by a protocol ®, and that 
the message satisfies the protocol. 


SENDTO 
{z >n Some from x to > & x &(( from, to, msg, SENT))} 


(n; sendto z msg to) 
{x. x = |msg| * z Gn Some from} 
Finally, to reason about receiving a message on a socket z the socket must be 


bound to an address governed by a protocol ®. 


RECEIVEFROM 
{z >n Some to * to > P} 


(n; receivefrom z) 


£. Z —n Some to * 
(x = None V (Sm. x = Some (body(m), from(m)) x &(m) * R(m) )) 
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When trying to receive a message on a socket, either a message will be received 
or no message is available. This is reflected directly in the logic: if no message 
was received, no resources are obtained. If a message m is received, the resources 
prescribed by (m) are transferred together with an unmodifiable certificate R(m) 
accounting logically for the fact that message m was received. This certificate 
can in the logic be used to talk about messages that has actually been received 
in contrast to arbitrary messages. In our specification of the two-phase commit 
protocol presented in Sect. 6, the notion of a vote denotes not just a message 
with the right content but only one that has been sent by a participant and 
received by the coordinator. 


4.2 Adequacy for Aneris 


We now state a formal adequacy theorem, which expresses that Aneris guarantees 
both safety, and, that all protocols are adhered to. 

To state our theorem we introduce a notion of initial state coherence: A 
set of addresses A C Address = Ip x Port and a map P : Ip i g™( Port) are 
said to satisfy initial state coherence if the following hold: (1) if (i, p) € A then 
i € dom(P), and (2) if i € dom(P) then P(i) = 0. 


Theorem 1 (Adequacy). Let p be a first-order predicate over values, i.e., 


a meta logic predicate (as opposed to Iris predicates), let P be a map Ip fa 
oĉ” (Port), and A C Address such that A and P satisfy initial state coherence. 
Given a primordial socket protocol ©, for each a € A, suppose that the Hoare 
triple 


{Fixed(A) x Ka bax HK Freelp(i)} (n1; e) {v.p(v)} 


acA icdom(P) 


is derivable in Aneris. 
If we have 


((n1; el (9, 0, P, 0)) =" ([(n1; e1), (n2; e2), aak (Nm; em), X) 
then the following properties hold: 


1. If e, is a value, then (e1) holds at the meta-level. 
2. Each e; that is not a value can make a node-local, thread-local reduction step. 


Given predefined socket protocols for all primordial protocols and the necessary 
free IP addresses, this theorem provides the normal adequacy guarantees of Iris- 
like logics, namely safety, i.e., that nodes and threads on nodes cannot get stuck 
and that the postcondition holds for the resulting value. Notice, however, that 
this theorem also implies that all nodes adhere to the agreed upon protocols; 
otherwise, a node not adhering to a protocol would be able to cause another 
node to get stuck, which the adequacy theorem explicitly guarantees against. 
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5 Case Study 1: A Load Balancer 


AnerisLang supports concurrent execution of threads on nodes through the 
fork {e} primitive. We will illustrate the benefits of node-local concurrency 
by presenting an example of server-side load balancing. 


Clients Load balancer TER 


2 i Serve 


®© socket node 


&> communication ~~ thread 


Fig. 4. The architecture of a distributed system with a load balancer and two servers. 


Implementation. In the case of server-side load balancing, the work distribution 
is implemented by a program listening on a socket that clients send their requests 
to. The program forwards the requests to an available server, waits for the 
response from the server, and sends the answer back to the client. In order to 
handle requests from several clients simultaneously, the load balancer can employ 
concurrency by forking off a new thread for every available server in the system 
that is capable of handling such requests. Each of these threads will then listen 
for and forward requests. The architecture of such a system with two servers and 
n clients is illustrated in Fig. 4. 

An implementation of a load balancer is shown in Fig. 5. The load balancer is 
parameterized over an IP address, a port, and a list of servers. It creates a socket 
(corresponding to zo in Fig. 4), binds the address, and folds a function over the 
list of servers. This function forks off a new thread (corresponding to Tı and T> 
in Fig. 4) for each server that runs the serve function with the newly-created 
socket, the given IP address, a fresh port number, and a server as arguments. 

The serve function creates a new socket (corresponding to z1 and 22 in Fig. 4), 
binds the given address to the socket, and continuously tries to receive a client 
request on the main socket (zo) given as input. If a request is received, it forwards 
the request to its server and waits for an answer. The answer is passed on to 
the client via the main socket. In this way, the entire load balancing process is 
transparent to the client, whose view will be the same as if it was communicating 
with just a single server handling all requests itself as the load balancer is simply 
relaying requests and responses. 


Specification and Protocols. To provide a general, reusable specification of the 
load balancer, we will parameterize its socket protocol by two predicates P;, 
and Pout that are both predicates on a message m and a meta-language value 
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rec load_balancer ip port servers = rec serve main ip port srv = 
let skt = socket () in let skt = socket () in 
let a = makeaddress ip port in let a = makeaddress ip port in 
socketbind skt a; socketbind skt a; 
listfold (\ server, acc. (rec loop () = 
fork { serve skt ip acc server }; match receivefrom main with 
acc + 1) 1100 servers SOME m => 


sendto skt (mı m) srv; 
let res = mı (listenwait skt) in 
sendto main res (m2 m); loop () 
| NONE => loop () 
end) () 


Fig. 5. An implementation of a load balancer in AnerisLang. Listfold and listenwait 
are convenient helper functions available in the appendix [20]. 


v. The two predicates are application specific and used to give logical accounts 
of the client requests and the server responses, respectively. Furthermore, we 
parameterize the protocol by a predicate Pa on a meta-language value that 
will allows us to maintain ghost state between the request and response as will 
become evident in following. 

In our specification, the sockets where the load balancer and the servers 
receive requests (the blue sockets in Fig. 4) will all be governed by the same 
socket protocol pe, such that the load balancer may seamlessly relay requests 
and responses between the main socket and the servers, without invalidating any 
socket protocols. We define the generic relay socket protocol ®,.; as follows: 


Perl Prat, Pin, Pout)(m) £ JY, v. from(m) > Wx Pin(m, v) * Pyai(v) x 
(Ym. Poailv) * Pout(m’,v) = Y(m')) 


When verifying a request, this protocol demands that the sender (corresponding 
to the red sockets in Fig. 4) is governed by some protocol W, that the request 
fulfills the Pin and P,a predicates, and that W is satisfied given a response that 
maintains P,a and satisfies Pout. 

When verifying the load balancer receiving a request m from a client, we 
obtain the resources Pin(m, v) and Pyai(v) for some v according to Brei. This 
suffices for passing the request along to a server. However, to forward the server’s 
response to the client we must know that the server behaves faithfully and 
gave us the response to the right request value v. P,e; does not give us this 
immediately as the v is existentially quantified. Hence we define a ghost resource 
LB(z,s,v) that provides fractional ownership for m € (0,1], which satisfies 
LB(1, s, v) 4F LB(4,s,v) * LB(4,8,v), and for which v can only get updated if 
ma = 1 and in particular LB(z, s,v) x LB(z,s,v’) => v =v’ for any m. Using 
this resource, the server with address s will have Py g(s) as its instantiation of 
Poal where 


Prp(s)(v) = LB(4,s,v). 


When verifying the load balancer, we will update this resource to the request 
value v when receiving a request (as we have the full fraction) and transfer 
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LB(3, s,v) to the server with address s handling the request and, according to 
Prel, it will be required to send it back along with the result. Since the server 
logically only gets half ownership, the value cannot be changed. Together with 
the fact that v is also an argument to Pin and Pout, this ensures that the server 
fulfills Pout for the same value as it received Pin for. The socket protocol for the 
serve function’s socket (z1 and 22 in Fig. 4) that communicates with a server 
with address s can now be stated as follows. 


® serve(8, Pouw) (M) £ Jv. LB(5, s, v) * Pout(m, v) 


Since all calls to the serve function need access to the main socket in order to 
receive requests, we will keep the socket resource required in an invariant Izp 
which is shared among all the threads: 


A 


Inp(n, z,a) =|z Gn Some a 


The specification for the serve function becomes: 
Inp(n, main, amain) * Dynamic((ip,p), A) * IsNode(n) * LB(1, s, v) * 
Amain Œ Breil A_-True, Pina Pout) * s E ®,e1(Prp(s), Pins Pont) 


(n; serve main ip p s) 
{False} 


The specification requires the address amain of the socket main to be governed 
by e, with a trivial instantiation of P,,; and the address s of the server to 
be governed by pe, with Ppa instantiated by Prp. The specification moreover 
expects resources for a dynamic setup, the invariant that owns the resource 
needed to verify use of the main socket, and a full instance of the LB(1, s, v) 
resource for some arbitrary v. 

With this specification in place the complete specification of our load balancer 
is immediate (note that it is parameterized by Pin and Pout): 


Static((ip, p), A, bret(A_. True, Pin, Pout)) * IsNode(n) * 


x Dynamic((ip, p’), A) | * 


p'Eports 


( -a Jv. LB(1, s, v) * s => BralPun() Pa Pat) 


SESTUS 


(n; load_balancer ip p srvs) 
True} 


where ports = [1100,--- ,1100 + |srvs|]. In addition to the protocol setup for 
each server as just described, for each port p' € ports which will become the 
endpoint for a corresponding server, we need the resources for a dynamic setup, 
and we need the resource for a static setup on the main input address (ip, p). 
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In the accompanying Coq development we provide an implementation of 
the addition service from Sect. 2.3, both in the single server case and in a load 
balanced case. For this particular proof we let the meta-language value v be a 
pair of integers corresponding to the expected arguments. In order to instantiate 
the load balancer specification we choose 


P244(m, (v1, 02)) Ê body(m) = serialize(v1, v2) 


Ped (m, (v1, ¥2)) = body(m) = serialize(v1 + v2) 


with serialize being the same serialization function from Sect. 2.3. We build and 
verify two distributed systems, (1) one consisting of two clients and an addition 
server and (2) one including two clients, a load balancer and three addition servers. 
We prove both of these systems safe and the proofs utilize the specifications we 
have given for the individual components. Notice that ®,.)(A_.True, P244, padd) 
and Pada from Sect. 2.3 are the same. This is why we can use the same client 
specification in both system proofs. Hence, we have demonstrated Aneris’ ability 
and support for horizontal composition of the same modules in different systems. 

While the load balancer demonstrates the use of node-local concurrency, its 
implementation does not involve shared memory concurrency, i.e., synchronization 
among the node-local threads. The appendix [20] includes an example of a 
distributed system, where clients interact with a server that implements a bag. 
The server uses multiple threads to handle client requests concurrently and 
the threads use a shared bag data structure governed by a lock. This example 
demonstrates Aneris’ ability to support both shared-memory concurrency and 
distributed networking. 


6 Case Study 2: Two-Phase Commit 


A typical problem in distributed systems is that of consensus and distributed 
commit; an operation should be performed by all participants in a system or none 
at all. The two-phase commit protocol (TPC) by Gray [6] is a classic solution 
to this problem. We study this protocol in Aneris as (1) it is widely used in 
the real-world, (2) it is a complex network protocol and thus serves as a decent 
benchmark for reasoning in Aneris, and (3) to show how an implementation can 
be given a specification that is usable for a client that abstractly relies on some 
consensus protocol. 

The two-phase commit protocol consists of the following two phases, each 
involving two steps: 


1. (a) The coordinator sends out a vote request to each participant. 
(b) A participant that receives a vote request replies with a vote for either 
commit or abort. 
2. (a) The coordinator collects all votes and determines a result. If all par- 
ticipants voted commit, the coordinator sends a global commit to all. 
Otherwise, the coordinator sends a global abort to all. 
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(b) All participants that voted for a commit wait for the final verdict from 
the coordinator. If the participant receives a global commit it locally 
commits the transaction, otherwise the transaction is locally aborted. All 
participants must acknowledge. 


Our implementation and specification details can be found in the appendix [20] 
and in the accompanying Coq development, but we will emphasize a few key 
points. 

To provide general, reusable implementations and specifications of the coordi- 
nator and participants implementing TPC, we do not define how requests, votes, 
nor decisions look like. We leave it to a user of the module to provide decidable 
predicates matching the application specific needs and to define the logical, local 
pre- and postconditions, P and Q, of participants for the operation in question. 

Our specifications use fractional ghost resources to keep track of coordinator 
and participant state w.r.t. the coordinator and participant transition systems 
indicated in the protocol description above. Similar to our previous case study, we 
exploit partial ownership to limit when transitions can be made. When verifying 
a participant, we keep track of their state and the coordinator’s state and require 
all participants’ view of the coordinator state to be in agreement through an 
invariant. 

In short, our specification of TPC 


— ensures the participants and coordinator act according to the protocol, i.e., 
e the coordinator decides based on all the participant votes, 
e participants act according to the global decision, 
e if the decision was to commit, we obtain the resources described by Q 
for all participants, 
e if the decision was to abort, we still have the resources described by P 
for all participants, 
— does not require the coordinator to be primordial, so the coordinator could 
change from round to round. 


6.1 A Replicated Log 


In a distributed replicated logging system, a log is stored on several databases 
distributed across several nodes where the system ensures consistency among the 
logs through a consensus protocol. We have verified such a system implemented 
on top of the TPC coordinator and participant modules to showcase vertical 
composition of complex protocols in Aneris as illustrated in Fig. 6. The blue 
parts of the diagram constitute node-local instantiations of the TPC modules 
invoked by the nodes to handle the consensus process. As noted by Sergey et al. 
[35], clients of core consensus protocols have not received much focus from other 
major verification efforts |7, 30, 40]. 

Our specification of a replicated logging system draws on the generality of the 
TPC specification. In this case, we use fractional ghost state to keep track of two 
related pieces of information. The first keeps a logical account of the log l already 
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Fig. 6. The architecture of a replicated logging system implemented using the TPC 
modules (the blue parts of the diagram) with a coordinator and two databases (S1 and 
S2) each storing a copy of the log. 


stored in the database at a node at address a, LOG(z, a,l). The second one keeps 
track of what the log should be updated to, if the pending round of consensus 
succeeds. This is a pair of the existing log l and the (pending) change s proposed 
in this round, PEND(z, a, (J, s)). We exploit fractional resource ownership by 
letting the coordinator, logically, keep half of the pending log resources at all 
times. Together with suitable local pre- and postconditions for the databases, 
this prevents the databases from doing arbitrary changes to the log. Concretely, 
we instantiate P and Q of the TPC module as follows: 


I> 


LWW 


Prep(p)(m) 
Qrep(P)(n) 


l, 5. (m = “REQUEST_" @ s) x LOG(4, p, 1) x PEND(5,p, (J, s)) 
l, s. LOG($,p,1@s) x PEND(3,p, (l, s)) 


(I> 


LU 


where @ denotes string concatenation. Note how the request message specifies the 
proposed change (since the string that we would like to add to the log is appended 
to the requests message) and how we ensure consistency by making sure the two 
ghost assertions hold for the same log. Even though l and s are existentially 
quantified, we know the logs cannot be inconsistent since the coordinator retains 
partial knowledge of the log. Due to the guarantees given by TPC specification, 
this implies that if the global decision was to commit a change this change 
will have happened locally on all databases, cf. LOG(4,p, l@s) in Qrep, and if 
the decision was to abort, then the log remains unchanged on all databases, 
cf. LOG(4,p, l) in Prep. We refer to the appendix [20] or the Coq development 
for further details. 


7 Related Work 


Verification of distributed systems has received a fair amount of attention. In 
order to give a better overview, we have divided related work into four categories. 
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Model-Checking of Distributed Protocols. Previous work on verification of dis- 
tributed systems has mainly focused on verification of protocols or core network 
components through model-checking. Frameworks for showing safety and liveness 
properties, such as SPIN [9], and TLA+ [23], have had great success. A benefit 
of using model-checking frameworks is that they allow to state both safety and 
liveness assertions as LTL assertions [29]. Mace [17] provides a suite for building 
and model-checking distributed systems with asynchronous protocols, includ- 
ing liveness conditions. Chapar [25] allows for model-checking of programs that 
use causally consistent distributed key-value stores. Neither of these languages 
provide higher-order functions or thread-based concurrency. 


Session Types for Giving Types to Protocols. Session types have been studied for 
a wide range of process calculi, in particular, typed a-calculus. The idea is to 
describe two-party communication protocols as a type to ensure communication 
safety and progress [10]. This has been extended to multi-party asynchronous 
channels [11], multi-role types [2] which informally model topics of actor-based 
message-passing and dependent session types allowing quantification over mes- 
sages [38]. Our socket protocol definitions are quite similar to the multi-party 
asynchronous session types with progress encoded by having suitable ghost- 
assertions and using the magic wand. Actris [8] is a logic for session-type based 
reasoning about message-passing in actor-based languages. 


Hoare Style Reasoning About Distributed Systems. Disel [35] is a Hoare Type 
Theory for distributed program verification in Coq with ideas from separation 
logic. It provides the novel protocol-tailored rules WithInv and Frame which 
allow for modularity of proofs under the condition of an inductive invariant 
and distributed systems composition. In Disel, programs can be extracted into 
runnable OCaml programs, which is on our agenda for future work. 

IronFleet [7] allows for building provably correct distributed systems by 
combining TLA-style state-machine refinement with Hoare-logic verification in a 
layered approach, all embedded in Dafny [24]. IronFleet also allows for liveness 
assertions. For a comparison of Disel and IronFleet to Aneris from a modularity 
point of view we refer to the Introduction section. 


Other Distributed Verification Efforts. Verdi [40] is a framework for writing and 
verifying implementations of distributed algorithms in Coq, providing a novel 
approach to network semantics and fault models. To achieve compositionality, the 
authors introduced verified system transformers, that is, a function that trans- 
forms one implementation to another implementation with different assumptions 
about its environment. This makes vertical composition difficult for clients of 
proven protocols and in comparison AnerisLang seems more expressive. 
EventML [30, 31] is a functional language in the ML family that can be used 
for coding distributed protocols using high-level combinators from the Logic of 
Events, and verify them in the Nuprl interactive theorem prover. It is not quite 
clear how modular reasoning works, since one works within the model, however, 
the notion of a central main observer is akin to our distinguished system node. 
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8 Conclusion 


Distributed systems are ubiquitous and hence it is essential to be able to verify 
them. In this paper we presented Aneris, a framework for writing and verifying 
distributed systems in Coq built on top of the Iris framework. From a programming 
point of view, the important aspect of AnerisLang is that it is feature-rich: it is a 
concurrent ML-like programming language with network primitives. This allows 
individual nodes to internally use higher-order heap and concurrency to write 
efficient programs. 

The Aneris logic provides node-local reasoning through socket protocols. That 
is, we can reason about individual nodes in isolation as we reason about indi- 
vidual threads. We demonstrate the versatility of Aneris by studying interesting 
distributed systems both implemented and verified within Aneris. The adequacy 
theorem of Aneris implies that these programs are safe to run. 


Table 1. Sizes of implementations, specifications, and proofs in lines of code. When 
proving adequacy, the system must be closed. 


Module Implementation Specification Proofs 
Load Balancer (Sect. 5) 
Load balancer 18 78 95 
Addition Service 
Server 11 15 38 
Client 9 14 26 
Adequacy (1 server, 2 clients) 5 12 62 
Adequacy w. Load Balancing 16 28 175 


(3 servers, 2 clients) 
Two-phase commit (Sect. 6) 


Coordinator 18 181 265 
Participant 11 280 
Replicated logging (Sect. 6 + appendix [20]) 
Instantiation of TPC - 85 - 
Logger 22 19 95 
Database 24 20 190 
Adequacy 13 - 137 


(2 dbs, 1 coordinator, 2 clients) 


Relating the verification sizes of the modules from Table 1 to other formal 
verification efforts in Coq indicates that it is easier to specify and verify systems 
in Aneris. The total work required to prove two-phase commit with replicated 
logging is 1,272 lines which is just half of the lines needed for proving the inductive 
invariant for TPC in other works [35]. However, extensive work has gone into 
Iris Proof Mode thus it is hard to conclude that Aneris requires less verification 
effort and does not just have richer tactics. 
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Abstract. Probabilistic Programming offers a concise way to represent 
stochastic models and perform automated statistical inference. However, 
many real-world models have discrete or hybrid discrete-continuous dis- 
tributions, for which existing tools may suffer non-trivial limitations. 
Inference and parameter estimation can be exceedingly slow for these 
models because many inference algorithms compute results faster (or 
exclusively) when the distributions being inferred are continuous. To 
address this discrepancy, this paper presents Leios. Leios is the first ap- 
proach for systematically approximating arbitrary probabilistic programs 
that have discrete, or hybrid discrete-continuous random variables. The 
approximate programs have all their variables fully continualized. We 
show that once we have the fully continuous approximate program, we 
can perform inference and parameter estimation faster by exploiting the 
existing support that many languages offer for continuous distributions. 
Furthermore, we show that the estimates obtained when performing in- 
ference and parameter estimation on the continuous approximation are 
still comparably close to both the true parameter values and the esti- 
mates obtained when performing inference on the original model. 


Keywords: Probabilistic Programming - Program Transformation - Continuity 
- Parameter Synthesis - Program Approximation 


1 Introduction 


Probabilistic programming languages (PPLs) offer an intuitive way to model 
uncertainty by representing complex probability models as simple programs [28]. 
A probabilistic programming system then performs fully automated statistical 
inference on this program by conditioning on observed data, to obtain a posterior 
distribution, all while hiding the intricate details of this inference process. 
Probabilistic inference is a computationally hard task, even for programs 
containing only Bernoulli distributions (4P-complete [18]), but prior work has 
shown that for many inference algorithms, continuous and smooth distributions 
(such as Gaussians) can be significantly easier to handle than the distributions 
having discrete components or discontinuities in their densities [15, 53, 52, 9, 56]. 
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or hybrid Transforms Predicate Synthesis 
distributions) Correction 


Fig. 1: Overview of Leios 


However, many popular Bayesian models can have distributions which are 
discrete or hybrid discrete-continuous mixtures (denoted simply as “hybrid” ) 
leading to computationally inefficient inference for much the same reason. Par- 
ticularly when the observed variable is a discrete-continuous mixture, inference 
may fail altogether [65]. Likewise even if the observed variable and likelihood 
are continuous, the prior or important latent variables, may be discrete (e.g., 
Binomial) leading to an equally difficult discrete inference problem [61,50]. 

In fact, a number of popular inference algorithms such as Hamiltonian Monte 

Carlo [48], NUTS [31,50], or versions of Variational Inference (VI) [9] only work 
for restricted classes of programs (e.g. by requiring each latent be continuous) 
to avoid these problems. Furthermore, we cannot always marginalize away the 
program’s discrete component since it is often precisely the one we are interested 
in. Even if the parameter was one which could be safely marginalized out, doing 
so may require the programmer to use advanced domain knowledge to analyti- 
cally solve and obtain a new model and re-write the program completely, which 
can be well beyond the abilities of the average PPL user. 
Problem statement: We address the question of how to accurately approx- 
imate the semantics of a probabilistic program P whose prior or likelihood is 
either discrete or hybrid, with a new program Po, where all variables follow 
continuous distributions, so that we can exploit the aforementioned inference 
algorithms to improve inference in an easy, off-the-shelf fashion. 

While a programmer could manually rewrite the probabilistic program or 
model and apply approximations in an ad hoc manner, such as simply adding 
Gaussian noise to each variable, this would be neither sufficient nor wise. For 
instance, it has been shown that when a model contains Gaussians, how they 
are programatically written and parametrized can impact the inference time and 
quality [29,5]. Also, by not correcting for continuity in the program’s branch 
conditions, one could significantly alter the probability of executing a particular 
program branch, and hence alter the overall distribution represented by the 
probabilistic program. 

Leios: We introduce a fully automated program analysis framework to continu- 
alize probabilistic programs for significantly improved inference performance, es- 
pecially in cases where inference was originally intractable or prohibitively slow. 

An input to Leios is a probabilistic program, which consists of (1) model 
that specifies the prior distributions and how the latent variables are related, 
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(2) specifications of observable variables, and (3) specifications of data sets. Leios 
transforms the model, given the set of the observable variables. This model is 
then substituted back into the original program to produce a fully continuous 
probabilistic program leading to greatly improved inference. Furthermore the 
approximated program can easily be reused with different, unseen data. 

Figure 1 presents the main workflow of Leios : 

— Distribution transformer and Boolean predicate correction: Leios first finds 
individual discrete distribution sample statements to replace with continu- 
ous approximations based on known convergence theorems that specifically 
match the distributions’ first moments [23]. Leios then performs a dataflow 
analysis to identify and then correct Boolean predicates in branches to best 
preserve the original program’s probabilistic control flow. To correct Boolean 
predicates, we convert the program to a sketch and fill in the predicates with 
holes that will then be synthesized with the optimal values. We ensure that 
the distribution of the model’s observed variables is fully continuous with 
a differentiable density function, by transforming it using an approach that 
adapts Smooth Interpretation [14] to probabilistic programs. We describe 
the transformations in Section 4. 


— Parameter Synthesizer: Leios determines the optimal parameters which min- 
imize a numerical approximation of the Wasserstein Distance to fill in the 
holes in the program sketch. This step of the algorithm can be thought of as 
a “training phase” much like in machine learning, and we need only perform 
it once for a given program, regardless of the number of times we will later 
perform inference on different data sets. These parameters correspond to 
continuity correction factors in classical probability theory [23]. We describe 
the synthesizer in Section 5. 


Contributions: This paper makes the following main contributions: 


— Concept: To the best of our knowledge, Leios is the first technique to auto- 
mate program transformations that approximate discrete or hybrid discrete- 
continuous probabilistic programs with fully continuous ones to improve in- 
ference. It combines insights from probability theory, program analysis, com- 
piler autotuning, and machine learning. 

— Program Transformation: Leios implements a set of transformations on 
distributions and the conditional statements that can produce provably con- 
tinuous probabilistic programs that approximate the original ones. 

— Parameter Synthesis: We present a synthesis algorithm that corrects the 
probabilities of taking specific branches in the probabilistic program and 
improves the overall inference accuracy. 

— Evaluation: We evaluated Leios on a set of ten benchmarks from existing 
literature and two systems, WebPPL (using MCMC sampling) and Pyro 
(using stochastic variational inference). The results demonstrate that Leios 
can achieve a substantial decrease in inference time compared to the origi- 
nal model, while still achieving high inference accuracy. We also show how 
a continualized program allows for easy off-the-shelf inference that is not 
always readily available to discrete or hybrid models. 
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uí a 1 | Model { 
2 Data := [12 ,8, es]; 3 prior = Uniform (20,50) ; 
3 | Model { : 
: A E a etd 5 Recruiters = Gaussian(mu_p,sigma_p) ; 
Recruiters = Poisson (prior); 5 = ; 


7 perfGPA = 4; 
s| regGPA = 4*Beta(7,3); 
9| GPA = Mix(perfGPA ,.05 ,regGPA,.95) 


€ 
z| perfGPA = Gaussian(4,8) ; 
s| regGPA = 4*Beta(7,3); 

) | GPA = Mix(perfGPA,.05 ,regGPA,.95) 


10 H 

u| if (GPA = 4) { ii if (CEOE) { 

12 Interviews = Bin( Recruiters ,.9) ; a 

13| } else if (GPA > 3.5) { ia TESEI : 
14 Interviews = Bin( Recruiters ,.6) ; ig } else if n: 
15| } else { : 

16 Interviews = Bin( Recruiters ,.5) ; 16 Ee 


Interviews 


19| Offers = Bin(Interviews ,0.4); J else i 
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22 | for d in Data { 
23 factor (Offers ,d); 


NNNNNN 
a e 


a 


26 | return prior; 


(a) (b) 
Fig. 2: (a) Program P and (b) the Continualized Model Sketch 


2 Example 


Figure 2 (a) presents a program that infers the parameters of the distribution 
modeling the number of recruiters coming to a recruiting fair given both the 
number of offers multiple students receive (line 1). As the number of recruiters 
may vary year to year, we model this count as a Poisson distribution (line 5). 
However, to accurately quantify how much this count varies year to year, we 
want to estimate the unknown parameter of this Poisson variable. We thus place 
a uniform prior over this parameter (line 4). 

The example represents the student GPAs in lines 7-9: it is either a perfect 
4.0 score or any number between 0 and 4. We model the perfect GPA with a dis- 
crete distribution that has all the probability mass at 4.0 (line 7). To model the 
imperfect GPA, we use a Beta distribution (line 8), scaled by 4 to lie in the range 
[0.0, 4.0]. Finally, the distribution of the GPAs is a misture of these two compo- 
nents (line 9). Our mixture assumes that 5% of students obtain perfect GPAs. 

Because the GPA impacts the number of interviews a student receives, our 
model incorporates control flow where each branch captures the distribution 
of interviews received, conditioned on the GPA being in a certain range (lines 
11-17). Each student’s resume is available to all recruiters and each recruiter 
can request an interview or not, hence all three of the Interviews distributions 
follow a Binomial distribution (here denoted as bin) with the same n (number of 
recruiters) but with different probabilities (higher probabilities for higher GPAs). 
From the factor statement (line 23) we see that the Offers variable governs the 
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distribution of the observed data, hence it is the observed variable. Furthermore, 
given the values of all latent variables, Offers follows a Binomial distribution 
(line 19), hence the likelihood function of this program is discrete. 

This program poses several challenges for inference. First, it contains dis- 
crete latent variables (such as the Binomials), which are expensive to sample 
from or rule out certain inference methods [26]. Second, it contains a hybrid 
discrete-continuous distribution governing the student GPA, and such hybrid 
distributions are challenging for inference algorithms [65]. Third, the model has 
complex control flow introduced by the if statements, making the observable 
data follow a (potentially multimodal) mixture distribution, which is yet an- 
other obstacle to efficient inference [43,17]. Lastly, the discrete distribution of 
the observed data and likelihood also hinder the inference efficiency [61, 50, 59]. 


2.1 Continualization 


Our approach starts from the observation that inference with continuous distri- 
butions is often more efficient for several inference algorithms [53, 52,56]. Leios 
first continualizes discrete and hybrid distributions in the original model. Start- 
ing in line 5 in Figure 2 (b), we approximate the Poisson variable with a Gaussian 
using a classical result [16], hence relaxing the constraint that the number of re- 
cruiters be an integer. (For ease of presentation we created new variables mu_p 
and sigma_p corresponding to the parameters of the approximation; Leios sim- 
ply inlines these.) We next approximate the discrete component of the GPA 
hybrid mixture distribution by a Gaussian centered at 4 and small tunable stan- 
dard deviation 8 (line 7). The GPA is now a mixture of two continuous distri- 
butions. We then transform all of the Binomials to Gaussians (lines 14, 18, 22, 
and 26) using another classic approximation [23]. 

Finally, Leios smooths the observed variables by a Gaussian to ensure the 
likelihood function is both fully continuous and differentiable. In this example 
we see that the approximation of the Binomial already makes the distribution of 
Offers (given all latent values) a Gaussian, hence this final step is not needed. 

After continualization, the GPA cannot be exactly 4.0, thus we need to re- 
pair the first conditional branch of the continualized program. In line 11, we re- 
place the exact equality predicate with the interval predicate 4-9; < GPA < 4+03 
where each 0 is a hole whose value Leios will synthesize. Leios finds all such 
branching predicates by tracking transitive data dependencies of all continual- 
ized variables. 


2.2 Parameter Synthesis 


Our continuous approximation should be close enough to the original model 
such that upon performing inference on the approximation, the estimations ob- 
tained will also be close to the ground-truth values. Hence Leios needs to ensure 
that the values synthesized for each 0 are such that for every conditional state- 
ment, the probability of executing the true branch in the continualized program 
roughly matches the original (ensuring similar likelihoods). In probability the- 
ory, this value has a natural interpretation as a continuity correction factor as 
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1 | Model { 
2 prior = Uniform(20,50) ; 
3 mu_p = prior; 
4 sigma_p = sqrt(prior); 

Recruiters = Gaussian(mu_p,sigma_p); 
€ 
7 perfGPA = Gaussian (4, pE T i 
ME a a m 2 Synthesized Program 
9 GPA = Mix(perfGPA,.05 ,regGPA,.95) ; 5 — Beta: 0.001 
10 
4 (3.99999 < GPA < EE) { 207 —— Beta: 0.05 
12 mu = Recruiters * O. Q — Beta: 0.1 
13 sigma = Se a oe ae 2 0.6 3 
14 Interviews = Gaussian (mu, sigma) ; s Beta: 0.25 
is; } else if (GPA > 3.500122) { 5 05 — Beta: 0.5 
16 mu = Recruiters * 0.6; c Beta: 1 
17 sigma = sqrt (Recruiters *0.6%0.4) ; D 0.4 - 
18 Interviews = Gaussian (mu, sigma) ;} v 
19 } else { ® 0.3 
20 mu = Recruiters * 0.5; a 
21 sigma = sqrt(Recruiters *0.5*0.5) ; £ 0 200 400 600 
22 Interviews = Gaussian(mu, sigma) ; Iteration 
23 } 
2 (b) 
25 mu2 = Interviews x 0.4; 
26 sigma2 = sqrt(Interviews *0.4*0.6) ; 
27 Offers = Gaussian (mu2,sigma2) ; 
28 | } 

(a) 


Fig. 3: (a) the fully continualized model and (b) Convergence of the Synthesis 
Step for multiple 6. 


it “corrects’ the probability of a predicate being true after applying continuous 
approximations. For the (GPA == 4) condition, we might think about using a 
typical continuity correction factor of 0.5 [23], and transform it to 4-0.5 < GPA 
< 4+0.5. However, in that case, the second else if (GPA > 3.5) branch would 
never execute, thus significantly changing the program’s semantics (and thus the 
likelihood function). Experimentally, such an error can lead to highly inaccurate 
inference results. 

Hence we must synthesize a better continuity correction factor that makes the 
approximated model “closest” to the original program’s with respect to a well- 
defined distance metric between probability distributions. In this paper, we will 
use the common Wasserstein distance, which we describe later in Section 5. The 
objective function aims to find the continuity correction factors that minimize 
the Wasserstein distance between the original and continualized models. 

Figure 3 (a) shows the continualized model. Leios calculated that the optimal 
values for the first branch are 6; = 0.00001 (hence the lower bound is 3.99999) 
and 02 = 0.95208 (hence the upper bound is 4.95208) in line 11, and 63 = 0.00012 
(hence the lower bound is 3.500122) for the branch in line 15. Intuitively the 
synthesizer found the upper bound 4.95208 so that any sample larger than 4 
(which must have come from the right tail of the continualized perfect GPA) 
is consumed by the first branch, instead of accidentally being consumed by the 
second branch. 


372 J. Laurel and S. Misailovic 


0.25 Original and Naive Versions 0.25 Continualized Version 
E Naive EE Leios 
9:20 MME Original _ 0-20 
© 2 
0.15 50.15 
oO oO 
© 0.10 © 0.10 
oa a 
0.05 0.05 
0.00 z 0.00 
0 10 20 30 40 0 10 20 30 
Offers Offers 


Fig. 4: Visual comparison between Model Distribution of Original Program 
with Naive Smoothing and Leios (both with 6 = 0.1) 


Another part of the synthesis step is to make sure that approximations do 
not introduce run-time errors. Since Interviews is now sampled from Gaus- 
sian, there is a small possibility that it could become negative, thus causing 
a runtime error (since we later take its square root). By dynamically sampling 
the continualized model during the parameter synthesis, as part of a light-weight 
auto-tuning step, Leios checks if such an error exists. If it does, Leios can instead 
use a Gamma approximation (which is always non-negative). 

While continualization incurs additional computational cost, this cost is typi- 
cally amortized. In particular, continualization needs to be performed only once. 
The continualized model can be then be used multiple times for inference on 
different data-sets. Further, we experimentally observed that our synthesis step 
is fast. In this example, for all the values of 3 we evaluated, this step required 
only a few hundred iterations to converge to the optimal continuity correction 
factors, as shown in Figure 3 (b). 


2.3 Improving Inference 


Upon constructing the continuous approximation of the model, we now wish to 
perform inference by conditioning upon the outcomes of 25 sampled students. 
To make a fair comparison, we compile both the original and continuous versions 
down to Webppl [26] and run MCMC inference (with 3500 samples and a burn- 
in of 700). We also seek to understand how smoothing latent variables improves 
inference, thus we also compare against a naively continualized version where 
only the observed variable was smoothed using the same 8, number of MCMC 
samples and burn-in. 

Figure 4 presents the distribution of the Offers variable in the original 
model, naively smoothed model, and the Leios-optimized model. The continu- 
ous approximation achieved by Leios is smooth and unimodal, unlike the naively 
smoothed approximation, which is highly multimodal. However all models have 
similar means 

Using these three models for inference, Figure 5 (a) presents the posterior 
distribution of the variable param for each approach. We finally take the mean as 
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Posterior (True Val: 37) 


—— Leios 
909 —— Naive 
2 — Original Metric Leios Naive Original 
8 0.04 
= Accuracy 0.058 0.069 0.090 
& 0.02 Runtime (s) 0.604 0.631 0.805 
(b) 


0.00 
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Parameter 


(a) 


Fig. 5: (a) Posteriors of each method — the true value is equal to 37. (b) Avg. 
Accuracy and Inference time; the bars represent accuracy (left Y-axis), the lines 
represent time (right Y-axis). 


the point-estimate, Test, of the parameter’s true value 7. Figure 5 (b) presents the 
run time and the error ratio, |*—7*=*|, for each approach (for the given true value 
of 37). It shows that our continualized version leads to the fastest inference. 


3 Syntax and Semantics of Programs 


We present the syntax and semantics of the probabilistic programming language 
on which our analyses is defined 


3.1 Source Language Syntax 


Program ::= DataBlock? ; Model { Stmt™ } ; ObserveBlock?; return Var; 


Stmt = skip | abort | Var := Expr | Var := Dist | CONST Var := Expr 
| Stmt ; Stmt | { Stmt } | condition ( BEzpr ) 
| if ( BEzpr ) Stmt else Stmt | for i = Int to Int Stmt 
| while ( BExpr ) Stmt 


Expr = Expr ArithOp Expr | f(Expr) | Real | Int | Var 

BExpr == BExpr or BEzxpr | BExpr and BExpr | not BExpr 
| Expr RelOp Expr | ( BExpr ) 

DataBlock n= Data:= [(Int | Real)*] 

ObserveBlock ::= for D in Data { factor(Var,D); } 

Dist = ContDist | DiscDist 


ContDist € { Gaussian, Uniform, etc.}, DiscDist € { Binomial, Bernoulli, etc.} 
ArithOp € {+,—,*,/,**}, f € {log, abs, sqrt, exp}, RelOp € {<, <, ==} 


The syntax is similar to the ones used in [24, 51]. Unlike [51], our syntax does include 
exact equality predicates, which introduce difficulties during the approximation. To give 
the developer the flexibility in selecting which parts of the program to continualize, 
we add the CONST annotation. It indicates that the variable’s distribution should not 
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be continualized. Until explicitly noted, we will not use this annotation in the rest 
of the paper. For simplicity of exposition, we present only a single DataBlock and 
ObserveBlock, but our approach naturally extends to the cases with multiple data and 
observed variables. 


Measure Theory Preliminaries Though various semantics have been proposed 
[44, 36, 7], we adapt the sub-probability measure transformer semantics of Dahlqvist et 
al. [19]. We will use the terms distribution and measure interchangeably. 


Definition 1. A program state o € S is a n-tuple of real numbers: S = R” where the 
i” tuple element corresponds to the it program variable’s value. 


Definition 2. A X-algebra on a set X (denoted as Xx ) is a collection of subsets of X 
such that (1) X € Xx and (2) Xi E€ Xx > X; € Xx (closure under complementation) 
and (3) Xı, X2 E€ Xx > Xı V X2 € Xx (closure under countable union). The tuple of 
(X, Xx) is called a measurable space. Our semantics is defined on the Borel measurable 
space (R”, B{R"}) where B{R"} is the standard Borel X-algebra over R”. 


Definition 3. A measure u over R” is a mapping from B{R"} to [0,+00) such that 
u(0) = 0 and w(Ujen Xi) = Xien H(Xi) when all X; are mutually disjoint. A probability 
measure is a measure that satisfies u(R") = 1 and a sub-probability measure is one 
satisfying u(R”) < 1. The simplest measure is the Dirac measure denoted as ĝa; (S) = 
1 if a; in S else 0. We denote the set of all sub-probability measures as M(R"). 


Definition 4. Given measures p, p2 E€ M(R), the product measure pı ® u2 E€ M(R*) 
is defined as pı Q p2(Bı xX B2) = pı (Bı)u2(B2) for Bi, B2 E€ B{R} 


Definition 5. Given a measure u E M(R”) the marginal measure of a variable x; is 
defined as uz;(Bi) = u(R x ...R x Bi x R...) for Bi € B{R} 


Definition 6. A kernel is a function s : S — M(R") mapping states to measures. 


Definition 7. The Lebesgue measure on R (denoted Leb) is the measure that maps 
any interval to its length, e.g., Leb([a,b]) = b — a. The Lebesgue measure in R” is 
simply the n-fold product measure of n copies of the Lebesgue measure on R. 


Definition 8. A measure u is absolutely continuous with respect to the Lebesgue mea- 
sure Leb (denoted as p & Leb or simply u is A.C.) iff for any measurable set S 
Leb(S) = 0 => a(S) =0. 


3.2 Semantics 


Expression Level Semantics Arithmetic Expression semantics are standard, they 
map states ø € R” to values, equivalently |Ezpr] : R” — R. Boolean Expression 
Semantics, denoted |_BEzpr], simply return the set of states B; € B{R”} satisfying the 
Boolean conditional. 


Id(o)=e¢ [xi)(o)=o[z] [ti op &](o) = [a] (e) op [al(o) Ioe) = F(a] (2) 
[Bi and Bə] = [Bı] N [B2] [Bı or BJ = [Bı] U [B2] = [not Bı] = R” \ [B1] 


[e1 relop e2] = {o E R” | [eı](o) relop [e2](c)} 
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Distribution Semantics The interpretation of a distribution is a kernel, «, map- 
ping a state to the measure associated with the specific parametrization of the dis- 
tribution in that state. Since measures are set functions we will represent them as A 
abstractions. The signature is [Dist] : R” —> (B{R} — [0, 1]) 


KCont(a) = [ContDist(e1, e2, ...)] (o) = AS. L 1s (v) - foont(v; [e1] (oc), [ez] (0), -.-) 


KDise(7) = [DiscDist(er,e2,... (0) =AS. XO fpiselv; [er] (0), [e2] (0), ---) 


ve Suppns 


Where fcont and fpisc are the density and mass functions, respectively, of the prim- 
= 2 


itive distribution being sampled from (e.g., fGauss(£; p, 0o) = sexe bet 1y¢50}) 
and Supp is the distribution’s support. 


Statement Level Semantics The statement-level semantics are shown in Figure 
6. We interpret each statement as a (sub) measure transformer, hence the semantic 
signature is [Statement] : M(R") > M(R”) . The skip statement returns the original 
measure and the abort statement transforms any measure to the 0 sub-measure. The 
condition statement removes measure from regions not satisfying the Boolean guard 
B. The factor statement can be seen as a “smoothed” version of condition that uses g, 
a function of the observed data and its distribution, to re-weight the measure associated 
with a set by some real value in [0,1] (as opposed to strictly 0 or 1). Deterministic 
assignment transforms the measure into one which assigns to any set of states S' the 
same value that the original measure u would have assigned to all states that end 
up in S after executing the assignment statement. Probabilistic Assignment updates 
the measure so that x;’s marginal is the measure associated with Dist, but with the 
parameters governed by wp. 

An if else statement can be decomposed into the sum of the true branch’s mea- 
sure and the false branch’s measure. The while loop semantics are the solution to the 
standard least fixed point equation [19], but can also be viewed as a mixture distri- 
bution where each mixture component corresponds to going through the loop k times. 
A for loop is just syntactic sugar for a sequencing of a fixed number of statements. 
We note that the Data block does not affect the measure (it is also syntactic sugar, 
and could simply be inlined in the Observe block). The program can be thought of as 
starting in some initial input measure po where each variable is undefined (which could 
simply mean initialized to some special value or even just zero), and as each variable 
gets defined, that variable’s marginal (and hence the joint measure jz) gets updated. 


4 Continualizing Probabilistic Programs 


Our goal is to synthesize a new continuous approximation of the original program P. 
We formally define this via a transformation operator TË [e]: Program — Program. 
Our approach operates in two main steps: 


(1) We first locally approximate the program’s prior and latent variables using a series 
of program transformations to best preserve the local structural properties of the 
program and then apply smoothing globally to ensure that the likelihood function 
is both fully continuous and differentiable. 
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[skip](u) = u [abort] (u) = AS.0 [Pis Po] (u) = [Pe] (EP) 


condition(B)](u) =AS.u(SN[B]) [factor(z:,t)] (u) = AS. 1s-g(t,c)-p(do) 
Rn 


[xi := e](u) = AS.u({ (£1, £n) E R” | (£1, ..., vs-1, le] (v1, En), Li41---, En) E SHY 


n 


[x: := Dist(e1,...ex)] (u) = AS. p(do)-dx,@...62;_,®@[Dist(e1,...ex)](7)@de,4,-.-(S) 
F 


[if (B) {Pi} else {P>}] (u) = [Pı] ([condition(B)]())+{[P2]([condition(not B)](s)) 


co 


[while (B) { Pi }] (su) = So [ (condition(s) ; P,)*; condition(not B)] (x) 
k=0 


Fig. 6: Denotational Semantics of Probabilistic Programs 


(2) We next synthesize a set of parameters that (approximately) minimize the distance 
metric between the distributions of the original and continualized models and we 
use light-weight auto-tuning to ensure the approximations do not introduce run- 
time errors. 


4.1 Overview of the Algorithm 


Algorithm 1 presents the technique for continualizing programs. It takes as input a 
program P containing a prior or observed variable that is discrete (or hybrid) and 
returns TE [P], a probabilistic program representing a fully continuous random variable 
with a differentiable likelihood function. The algorithm uses a tunable hyper-parameter 
B € (0,00) to control the amount of smoothing (like in [14]). A smaller 8 leads to less 
smoothing, while a larger 8 leads to more smoothing, however the smallest 8 does not 
always lead to the best inference, and vice-versa, as can be seen in section 7. 

In line 3 of Algorithm 1 Leios constructs a standard control flow graph (CFG) 
to represent the program, using a method called GetCFG(). This data structure will 
form the basis of Leios’s future analyses. Each CFG node corresponds to a single 
statement and contains all relevant attributes of that statement. Leios then uses this 
CFG to build a data dependency graph (line 4) which will be used for checking which 
variables are tainted by the approximations. In line 5 Leios then applies Te [e] to 
obtain a continualized sketch, Pc. Lastly, Leios synthesizes the optimal continuity 
correction parameters (line 7), and in doing so, samples the program to detect if a 
runtime error occurred, also returning a Boolean flag success to convey this information. 
If a runtime error did occur we find the expression causing it (line 9) and then in 
lines 10-12 reapply the safer transformations (e.g., Gamma instead of Gaussian) to all 
possible dependencies which could have contributed to the runtime error. 


4.2 Distribution and Expression Transformations 


To continualize each variable, Leios mutates the individual distributions and expres- 
sions assigned to latent variables within the program. We use a transform operator for 
expressions and distributions Te [e]: ExprU Dist > ExprU Dist, which we define next. 
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Algorithm 1: Procedure for Continualizing a Probabilistic Program 


1 function Continualize (P, 8); 

Input : A probabilistic program P containing discrete/hybrid observable 
variables and/or priors and a smoothing factor 8 > 0 

Output: A fully continuous probabilistic program Po 

Acceptable + False; 

CFG + GetCFG(P); 

DataDepGraph + ComputeDataFlow(CFG); 


Po + TÉ [P]; /* apply all continuous transformations */ 
while not Acceptable do 
Po, success + Synthesize(Pc, P); 
if not success: 

D + getInvalidExpression(); 

Deps + getDependencies(DataDepGraph,D); 

forall Expression in Deps do 

Pc + reapplySafeTransformation( Po, Expression); 

else: 

Acceptable + True; 


omrnranr AOUN 


no 
À wON FO 


end 
return Po 


m e 
an 


Transform Operator For Distributions and Expressions We now detail 
the full list of continuous probability distribution transformations that TE [e] uses. 


Gaussian(à, VA) E = Poisson(A) 
Gamma(A, 1) E = Poisson(A) & Gaussian fails 
Gaussian(np, y/np(1 — p)) E = Binomial(n, p) 
Gamma(n, p) E = Binomial(n,p) & Gaussian fails 
Uniform(a, b) E = DiscUniform(a, b) 
Exponential(p) E = Geometric(p) 

TË [E] = $ MixOfGaussg(|(1, p), (0,1 — p)]) E = Bernoulli(p) 
Beta(6, B=") E = Bernoulli(p) & MizOfGauss fails 
Micture(((Té [D1], p1), --(TÉ[D2],p2)]) E = Mixture(((D1, p1),--(Da,p2)]) 
Gaussian(c, B) E = c (constant) 
E E = azi +b (a40) 
KDE(8) E € DiscDist & not covered 
Gaussian(E, B) otherwise 


The rationale for this definition is that these approximations all preserve key struc- 
tural properties of the distributions’ shape (e.g., the number of modes) which have been 
shown to strongly affect the quality of inference [25, 45,17]. Second, these continuous 
approximations all match the first moment of their corresponding discrete distributions, 
which is another important feature that affects the quality of approximation [53]. We 
refer the reader to [54] to see that for each distribution on the left, the corresponding 
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continuous distribution on the right has the same mean. These approximations are best 
when certain limit conditions are satisfied, e.g. A > 10 for approximating a Poisson dis- 
tribution with Gaussian, hence the values in the program itself do affect the overall 
approximation accuracy. 

However, if we are not careful, a statement level transformation could introduce 
runtime errors. For example, a Binomial is always non-negative, but its Gaussian ap- 
proximation could be negative. This is why Te [e] has multiple transformations for the 
same distribution. For example, in addition to using a Gaussian to approximate both a 
Binomial and a Poisson, we also have a Gamma approximation since a Gamma distri- 
bution is always non-negative. Likewise we have a Beta approximation to a Bernoulli 
if we require that the approximation also have support in the range [0,1]. Leios uses 
auto-tuning to safeguard against such errors during the synthesis phase, whereby when 
sampling the transformed program, if we encounter a run-time error of this nature, 
we simply go back and try a safer (but possibly slower) alternative (Algorithm 1 line 
12). Since there are only finitely many variables and (safer) transformations to apply, 
this process will eventually terminate. For discrete distributions not supported by the 
specific approximations, but with fixed parameters, we empirically sample them to get 
a set of samples and then use a Kernel Density Estimate (KDE) [62] with a Gaussian 
kernel (the KDE bandwidth is precisely 8) as the approximation. 

Lastly, by default all discrete random variables become approximated with contin- 
uous versions, however we leave the option to the user to manually specify CONST in 
front of a variable if they do not wish for it to be approximated (in which case we no 
longer make any theoretical guarantees about continuity). 


4.3 Influence Analysis and Control-Flow Correction of Predicates 


Simply changing all instances of discrete distributions in the program to continuous 
ones is not enough to closely approximate the semantics of the original program. We 
additionally need to ensure that such changes do not introduce control flow errors into 
the program, in the sense that quantitative properties such as the probability of taking 
a particular branch need to be reasonably preserved. 


Avoiding Zero Probability Events A major concern of the approximation is 
to ensure that no zero-probability events are introduced, such as when we have an 
exact equality “==” predicate in an if, observe or while statement and the vari- 
able being checked was transformed from a discrete to a continuous type. For example, 
discrete programs commonly have a statement like x := Poisson(1) followed by a con- 
ditional such as if (x==4), because the probability that a discrete random variable 
is exactly equal to a value can be non-zero. However upon applying our distribution 
transformations and transforming the distribution of x from a discrete Poisson to a con- 
tinuous Gaussian, the conditional statement “if (x==4)” now corresponds to a zero 
probability (or measure zero) event, as the probability that an absolutely continuous 
probability measure assigns to the singleton set {4} is by definition zero. Thus, if not 
corrected for, we could significantly change the probabilities of taking certain branches 
and hence the overall distribution of the program. 

The converse can also be true: applying approximations can make a zero proba- 
bility event in the original program now have non-zero probability. For example, in 
x := DiscUniform(1,5); if (x<3 and x>2) the true branch has probability zero of 
executing but this becomes non-zero after approximations are applied. However, the 
branch paths like these in the original model could be identified by symbolic analysis 
(e.g., [24]) and removed via dead code elimination during pre-processing. 
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Correcting Control Flow Probabilities via Static Analysis To prevent 
zero-probability events and ensure that the branch execution probabilities of the con- 
tinualized program closely matches the original’s, we use data dependence analysis to 
track which if, while or condition statements have logical comparisons with vari- 
ables “tainted” by the approximations. A variable v is “tainted” if it has a transitive 
data dependence on an approximated variable, and we use reaching definitions analysis 
[35] on the program’s CFG to identify these. 

As shown in Algorithm 1 line 4, to compute the reaching definitions analysis we use 
a method called ComputeDataFlow() as part of a pre-transformation pass whereby for 
each program point in the CFG, each variable is marked with all the other variables 
on which it has a data-dependence. These annotations are stored in a data structure 
called DataDepGraph which maps nodes (program points) to sets of tuples where 
each tuple contains a variable, the other variables it depends on (and where they are 
assigned), and lastly, whether it will become tainted. Note that in the algorithm this 
step is done before the previously discussed expression-level transformations, hence why 
ComputeDataFlow() marks which variables will become continualized and which ones 
will not (i.e if a variable already defines a continuous random variable or was annotated 
with CONST). Furthermore, though we are computing the data dependencies before the 
approximations, because the approximations do not re-order or remove statements, all 
data dependencies will be the same before and after applying the approximations. 


Transform Operator For Boolean Expressions We take all such control 
predicates that contain an exact equality “==” comparison with a tainted variable and 
transform these predicates from exact equality predicates to interval-style predicates. 
Thus if we originally had a predicate of the form if (x==4) we will mutate this into a 
predicate of the form if (x>4-6, && x<4+02) where @ are now placeholder values that 
will need to be filled with a concrete value during the synthesis phase (Section 5). Hence 
checking for exact equality gets relaxed to checking for containment within the interval 
(4— 01,4 + 02). We also need to correct < and <= predicates if one of the variables was 
approximated or transitively affected by an approximation. 

Hence we also define our transform operator Te [e] : BExpr > BExpr at the level 
of Boolean expressions: 


TEY j] (y—01 < x) and (x < y + 82) default 
g == = 
B 7 (x == y) CONST æ and CONST y specified 
(x<y+90) ifzx ory tainted 
TRIE < y)] = 
g [e < »)] t <y) otherwise 
Tle < y)] (x<y+90) ifzx ory tainted 
x = 
PATSY (a < y) otherwise 


Because we have already pre-computed DataDepGraph one can check if a variable in 
a given statement or expression is tainted (or marked as CONST) in constant time. 

This correction has a natural interpretation in classical probability theory. It is 
well known that to approximate a discrete distribution X with a continuous one xX ; 
we need a continuity correction factor, 9, such that P(X < x) ~ P(X < z +0) (hence 
why TE [e] also corrects < and <= predicates). For simple approximations (i.e Binomial 
to Gaussian), the canonical correction factor is known (0 = 0.5) [23], however for the 
general case, it is not. Furthermore, it has been shown that in many cases, 0.5 is not 
the best correction factor [3]. 
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4.4 Bringing it all together: Full Program Transformations 


Having defined the transformation for distributions, arithmetic and Boolean expres- 
sions, we now define the program transformation operator TE [e]: Program + Program 
inductively: 


TE |P P) = TEIP; TE [P] 
Tplif (B) {Pi} else {P} = if (TAIB) Tf[Pi] else T$ [P2] 
Tf[while(B) Pi] = while(7§[B]) 72 [Pi] 
Tf [condition (B) = condition (7# [B]) 
TP [x := E] = x := Te [E] 
TË[CONST x := E] = x :=E 


The abort, factor and skip statements and the DataBlock remain the same after 
applying the transformation operator T [e]. 


Ensuring Smoothness Upon applying the statement-level transformations and 
performing both dataflow analysis and predicate mutations, Leios ensures each latent 
variable comes from a continuous distribution. However a continuous distribution may 
still have jump discontinuities or non-differentiable regions in its density function (such 
as a uniform distribution), which can make inference difficult [66]. Furthermore it is 
known that performing parameter estimation on data that is distributed according 
to a discontinuous or non-smooth density function, or on distributions with a non- 
smooth likelihoods can be just as challenging [50, 1,59]. Thus to make the Program’s 
likelihood function and density function of the observed data fully smooth, we need to 
apply additional Gaussian smoothing. 

Since it would be redundant to apply smoothing if we already knew this variable 
came from a smooth distribution (as in the example) hence we make this simple check 
first. The following transformation performs this on the observed variables (which 
appear in the factor statement). 


F] = Xo i= E if x already smooth 
E Xo := Gaussian (E, f); otherwise 


We could perform additional smoothing for every variable to ensure each has a 
differentiable density, however we empirically observed that the variance added up 
enough to where inference quality deteriorated, hence we only apply the additional 
smoothing to observed variables. 

Having defined the statement-level transformations we now state a theorem about 
TR [e] preserving continuity. As many applications may invoke inference at any point 
in the program [46,60], it is important that absolute continuity of each marginal hold 
at every point. 


Theorem 1. In the transformed program, Te [P], the marginal sub-probability measure 
of each variable, denoted jiz,, is absolutely continuous with respect to the Lebesgue 
measure (denoted fiz, is A.C.) at each program point for which that variable is defined. 


Proof. (sketch) To prove the theorem we will show that when any variable x; is initially 
defined, it comes from an absolutely continuous distribution and furthermore that the 
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semantics of each statement in TR [P] preserves the absolute continuity of each marginal 
measure (where Ha; = (Rx... x Bi x R... x R)), equivalently for any statement, any 
(already defined) variable x; and any Borel set B; € B{R}: 


w(R x ... x Bi x R... x R) is A.C. => [statement] (u) (R x ... x Bi x R... x R) is A.C. 


Case 1. skip and abort: Since skip is the identity measure transformer of each de- 
fined marginal measure Hs; was A.C. before, then they will trivially be so afterward 
since they are unchanged. abort sends each marginal to the 0 sub-measure (which 
is trivially A.C.). 


Case 2. condition and factor: Since factor and condition only lose measure we have 
[condition(B)](p)(S) < u(S) and [factor (x, ,t)](u)(S) < u(S) for any Borel set S. 
Thus a(S) = 0 = [condition (B)](~u)(S}) = 0 and (S) = 0 => [factor (xz ,t)]}(w)(S) = 0 
since all measures are non-negative. Hence by transitivity, since u(Rx...B;xR...) is A.C., 
[factor (x, ,t)](u)(S)(R x ...B; x R... x R) is A.C. and likewise for similar reasons, we 
have that [condition (B)](j1)(R x ...Bi x R... x R) is A.C. 


Case 8. Assignment: Probabilistic assignment is straightforward. Since the continu- 
alized program only samples from absolutely continuous distributions, the marginal 
of the sampled variable x; will be A.C. and all other marginals us; were A.C. by 
assumption. Deterministic assignment has to be handled carefully. In the continual- 
ized program the only deterministic assignments will be x; := a*x;+b; for a Æ 0 (all 
other assignments are smoothed). The marginal jiz,(S) is just us; (aS +b) where the 
set aS +b = {s E R | a-s+b e S}. However by assumption of the A.C. of zj, 
Leb(aS + b) = 0 = pa; (aS + b) = 0, but Leb(S) = 0 + Leb(aS + b) = 0 [55], hence: 
Leb(S) = 0 = Leb(aS +b) = 0 > us; (aS +b) = 0. Lastly by the semantic definition 
of xi, we have that uz; (aS +b) =0 => pe,(S) = 0, hence Leb(S) = 0 > px, (S) = 0 by 
transitivity. All other marginals are unchanged, hence A.C. of each is preserved. 


Case 4. Sequencing, if and while: Intuitively since the above statements each preserve 
A.C of each marginal, any sequencing of them should too. Since the sum of two measures 
that are both A.C. in each marginal is also A.C. in each marginal, if statements 
preserve A.C. of each marginal. For this same reason while loops also preserve A.C. 


5 Synthesis of Continuity Correction Parameters 


We now present our procedure for synthesizing optimal continuity correction parame- 
ters which covers lines 6 to 15 in Algorithm 1. This can be thought of as a “training” 
step which fits the continualized model to the original one. It is important to note that 
this step is agnostic to the observed data (it only fits to the Model), hence it need only 
be done once off-line, regardless of how many times we perform inference on new data 
sets. Furthermore, even if we do not have parameters to synthesize, this step is still 
useful for catching runtime errors caused by the approximations, so that we can go 
back and apply safer approximations if necessary. 


5.1 Optimization Framework 


Ideally the posteriors of our approximated program TE [P] and the original P, should 
be reasonably close. However a specific posterior is induced by the corresponding data- 
set, if our optimization objective tries to minimize the statistical distance from TE [P] 
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to P, we would simply be over-fitting to the data and we would not be able to re-use 
Th [P] for new data sets with different true parameters. Instead our objective is to 
minimize the distance between the original model M, which is simply the fragment of 
P that does not contain the data or observe block (and hence only defines the prior, 
likelihood and latent variables), and the corresponding continualized approximation, 
T [M]. To do so, we need to choose the best possible continuity correction factors, 
0, for TE [M]. Thus we define the “optimal” parameters as those which minimize a 
distance metric d between probability measures d : M(R”) x M(R”) — [0, 00). We 
also need to ensure that the metric can (a) compute the distance between discrete and 
continuous distributions and (b) is such that if models or likelihoods are close with 
respect to d, the posteriors should be as well. 


Wasserstein Distance We choose to use the Wasserstein distance primarily be- 
cause (1) it can measure the distance between a continuous and discrete distribution 
(unlike KL-Divergence or Total Variation Distance) and (2) prior work has shown that 
when performing inference, if using the Wasserstein distance as the chosen metric to 
approximate a likelihood, the (approximate) posteriors induced are comparable to the 
true posteriors (obtainable if one used the true likelihood) [49]. Additionally, unlike 
other metrics, the Wasserstein metric incorporates the underlying difference in geom- 
etry of the distributions (which strongly affects inference accuracy [37, 59]). 

Let [M] (u0) represent the renormalized measure associated to the observed vari- 
ables of the original model and let ez [Mo]] (u0) represent the observed variables of 
the continualized model, but where a given continuity correction factor 0 has been 
substituted in (both measures start in initial distribution po). Furthermore, let J C 
M(R?) represent the set of all joint measures with marginal measures [M](0) and 
[72 [Mel] (10). Hence we now define the 1-Wasserstein Distance: 


W ([M] (10), ITE [Mo] (H0)) = inf 1 Ila — yl|dJ (x,y) (1) 


We also provide further justification why the Wasserstein Distance is a sensible 
metric to use. It is well known that a mixture of Gaussians can converge in distribution 
to any continuous random variable, however existing work has shown that a mixture 
of Gaussians can approximate any discrete distribution in the Wasserstein Distance 
arbitrarily well [20]. 


Objective Function We now formulate our optimization approach as follows, where 
0 is the parameter vector minimizing the Wasserstein Distance with respect to the 
original model M, and d is the number of parameters to synthesize. 


6 = argmin W([M] (uo), [TË [Me] (u0)) (2) 
9€(0,1)4 

To restrict the search space we follow common practice [23, 3] by requiring each 6; € 
(0, 1). Such optimization problem lacks a closed form solution. Symbolically computing 
the Wasserstein Distance is intractable, hence we numerically approximate it via the 
empirical Wasserstein Distance (EWD) between observed samples of M and T$ [Mo]. 
Because this step is fully dynamic (we run and sample the model), the samples are 
conditioned upon successfully terminating, and hence the model’s sub-measure has 
been implicitly renormalized to a full probability measure, thus justifying the use of a 

fully renormalized measure in equations (1) and (2). 
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Algorithm 2: Synthesizing Optimal Continuity Correction Parameters 


1 Function Synthesize P, T$ [P]; 
Input : A program P and a continualized sketch TË [P] with d parameters to 
be synthesized 
Output: A fully continuous probabilistic program Pc and a binary flag 
denoting the existence of a runtime error 


2 if d==0 then 

3 s +—sample(T$ [P],n); 

4 if s==Error then 

5 return T [P], false 

6 end 

7 end 

8 else 

9 | M,T£[M] <getModel(P, T$ [P]); 
10 for 6; € Grid((0, 1") do 

11 p,s + Nelder-Mead(W 0;,M,7/ [M],7,€,7); 
12 if s==Error then 

13 | return T [P], false 

14 end 

15 if W(p) < WÊ) then 

16 6<p 

17 end 

18 end 
19 end 


N 
© 


return substitute (T$ [P], 6), true 


Though intuitively we would expect that as we apply less smoothing (i.e. 8 < 1), 
the optimal 8; should also be smaller (less need for correction) and the continualized 
program should become closer to the original, a simple negative result illustrates this 
is not always the case and that the dependence between the smoothing and continuity 
correction must be non-linear. 


Remark 1. Ô cannot be linearly proportional to £. 


Proof. Let X be the constant random variable that is 0 with probability 1 and let 
X’ ~ Gaussian(0, 8). Furthermore, let I := (X == 0) and Ie := (cB < X’ < cp) be 
two indicator random variables. Intuitively we want Ie to have the same probability of 
being true as I for any 8. However if c is constant (such as 1) then Pr(cB < X’ < cB) 
will always be the same regardless of 8 (when c = 1, the probability is always 0.68). 


5.2 Optimization Algorithm 


Algorithm 2 presents our approximate synthesis algorithm, which is called as a sub- 
routine in the main algorithm. As seen in line 2, if there are no parameters to be 
synthesized (d == 0) we still sample the continualized program in hopes of uncovering 
a possible runtime error (or gaining statistical confidence that one does not occur). We 
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check for such an error in line 4 and if one exists, we return immediately, with the flag 
variable set to false (line 5). 

To evaluate the EWD objective function (when there are parameters to synthesize), 
Algorithm 2 follows a technique from [14] and uses a Nelder-Mead search (line 11), 
due to Nelder-Mead’s well known success in solving non-convex program synthesis 
problems. We first extract the fragment of the programs corresponding to the models, 
M and E [M], respectively in line 9. In each step of the Nelder-Mead search we take 
n samples (n ~ 500) of TE [M], but with a fixed value of 0; substituted into TE(M], 
to compute the EWD with respect to samples of the original model M (which have 
been cached to avoid redundant resampling). The Nelder-Mead search steps through 
the parameter space (with step size 7 > 0), substituting different values of 0 into 
Je [M]. This process continues until the search converges to a minimizing parameter, 
p, that is within the stopping threshold e > 0 or encounters a runtime error during 
the sampling (which is checked in line 12). As before, if we encounter such an error we 
immediately return with the flag set to false (line 13). Following [14], we successively 
restart the Nelder-Mead search from k evenly spaced grid points in [0, 1]* (hence the 
loop in line 10), to find the globally optimal parameter (hence our approach is robust 
to local minima), which we successively update in lines 15-16. If no runtime error was 
ever encountered, we substitute in the parameters with the minimum EWD over all 
runs, 9, to the fully continuous program TE [P] and return (line 20). Though it can be 
argued this sampling is potentially as difficult as the original inference, we reiterate 
that we need only do this once offline, hence the cost is easily amortized. 


6 Methodology 


6.1 Benchmarks 


Table 1 presents the benchmarks. For each benchmark, Columns 2 and 3 present the 
original prior and likelihood type, respectively. Column 4 presents whether the conti- 
nuity correction was applied. Column 5 presents the time to continualize the program, 
TCont.. As can be seen in Columns 4 and 5 the total continualization time, TCont., 
depends on whether parameters had to be synthesized. GPAExample had the longest 
Tcont. at 3.6s, due to the complexity of the multiple predicates, however these times 
are amortized as our synthesis step is done only once. 

As our problem has received little attention, no standard benchmark suites exist. 
In fact, to make inference tractable, for many models, developers would construct 
continuous approximations by hand, in an ad hoc fashion. However we wanted a 
benchmark suite that showcased all 3 inference scenarios that our approach works 
for: (1) discrete/hybrid prior and discrete/hybrid likelihood (2) continuous prior but 
discrete/hybrid likelihood and (3) discrete/hybrid prior but a continuous likelihood. 
Therefore, we obtained the benchmarks in two ways. First, we looked at variations 
of the mixed distributions benchmarks previously published in the machine learning 
community, e.g., [65,58], which served as the inspiration for our GPAExample. Sec- 
ond, we took existing benchmarks [27,30] for which designers modeled certain distri- 
butions with continuous approximations, and we retro-fitted these models with the 
corresponding discrete distributions. This step was done for Election, Fairness, 
SVMfairness, SVE, and TrueSkill. These discretizations were only applied where they 
made sense, e.g., the Gauss (np,np(1-p)) in the original Election program became dis- 
cretized as Binomial(n,p). We also took popular Bayesian models from Cognitive 
Science literature which use multiple discrete latent variables [39] and these models 
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Table 1: Description of Benchmarks 


Program Prior Likelihood Correction? Tcont. (8) 
GPAExample Uniform Discrete v 3.643 
Election [27] DiscUniform Bernoulli v 1.139 
Fairness [2] DiscUniform Bernoulli v 1.809 
SVMfairness [2] Binomial Continuous v 1.578 
TrueSkill [30] Poisson Bernoulli v 1.149 
DiscreteDisease DiscUniform Discrete x 0.006 
SVE [58] Uniform Hybrid x 0.009 
BetaBinomial [39] Beta Discrete x 0.006 
Exam [39] Uniform Discrete x 0.008 
Plankton [10] DiscUniform Discrete x 0.006 


are BetaBinomial and Exam. Lastly we took population models from the mathematical 
biology literature [10, 4] to build benchmarks since populations are by nature discrete. 
This was done for Plankton and DiscreteDisease. We present the original programs 
in the appendix [38]. 


Implementation We implemented Leios in Python (~4.5K LoC). All experiments 
were run on an Intel Xeon, multi-core desktop running Ubuntu 16.04 with a 3.7 GHz 
CPU and with 32GB RAM. All results are obtained from single-core executions. 


6.2 Experimental Setup 


Continualized Versions As there are no other general tools that automatically 
continualize probabilistic programs in mainstream languages, we compare Leios with: 


— Original Program: inference done in standard fashion on the original model, and 

— Naive Smoothing: inference done on a KDE style model in which Gaussian smooth- 
ing is applied only to the observed variable, but no approximations are applied to 
the inner latent variables. 


We will refer to these as simply “Original” and “Naive” respectively. 


Inference Accuracy Comparison using Ground Truth Our experimental 
design compares the respective inference estimates with the ground truth. We set the 
experiments as follows: For each of the original discrete or hybrid programs P, we 
replace the program variable corresponding to the prior distribution with a fixed value 
T (the ground-truth) to obtain P(r). We then sample P(r) to obtain 25 observed 
data points, which will be used to test inference performance on P, Pys, and Preios 
respectively. To test inference performance we then score P (original program), Pys 
(naively smoothed program), and Preios against the observed data points to infer the 
posterior over the ground truth parameter T. Note the programs only have access to 
the data samples, but not T. 

For each of the 3 versions: P, Pys, and Preios, we take the inferred posterior means 
as the estimates of the value, and then compare it with the ground-truth value 7 to 
measure the error ratio E = | Test . This entire procedure is repeated for 10 different 
values of T to get a representative average of inference performance over a wide range 


of true parameter values. 
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Table 2: Inference Times (s) and Error Ratios for each model, 8 = 0.1 


Program Original Original Naive Naive Leios Leios 

Time Error Time Error Time Error 
GPAExample 0.806 0.090 0.631 0.070 0.605 0.058 
Election - - 3.232 0.051 0.616 0.036 
Fairness 4.396 0.057 0.563 0.056 0.603 0.093 
SVMfairness - - 0.626 0.454 0.980 0.261 
TrueSkill 3.668 0.009 0.494 0.059 0.586 0.053 
DiscreteDisease 4.944 0.009 1.350 0.013 0.490 0.008 
SVE - - 0.522 0.045 0.516 0.091 
BetaBinomial 1.224 0.028 0.564 0.024 0.459 0.013 
Exam 3.973 0.087 0.504 0.126 0.527 0.133 
Plankton 0.570 0.017 0.457 0.080 0.453 0.042 
Average 2.797 0.043 0.894 0.098 0.584 0.079 


Analyzed Probabilistic Programming Systems. We used two languages in 
our development: WebPPL [26] (with MCMC inference) and Pyro [8] (with Varia- 
tional inference). Our implementation automatically generates WebPPL code for all 
the programs. We used 3500 MCMC samples (with burn-in of 700 samples) in the 
simulation. For Pyro, we only wanted to test fully-automatic black-box Variational In- 
ference, hence we did not manually marginalize out discrete variables (which is often 
not even applicable, as the discrete variables are the one we wish to estimate). 


Inference Time Measurement We measure the time taken for inference for each 
version using built-in timers (which exclude file reading and warm-up). A timeout of 
10 minutes was used for the inference step. We used this same procedure for both 
MCMC-based sampling in WebPPL and Variational Inference in Pyro. 


7 Evaluation 


We study the following three research questions: 


RQ1 Can program continualization make inference faster, while still maintaining a 
high degree of accuracy, compared to the original program and naive smoothing? 


RQ2 How do performance and accuracy vary for different smoothing factors 8? 


RQ3 Can program continualization enable running transformed programs with off- 
the-shelf inference algorithms that cannot execute the original programs? 


7.1 RQ1: Benefits of Continualization 


Table 2 presents detailed timing and accuracy errors for a single smoothing factor 8 
on WebPPL programs. Columns 2 and 3 present the time and error (compared to the 
ground truth) for the original program. Columns 4 and 5 present time/error for the 
naive smoothing and Columns 6 and 7 present time/error for Leios. 

From Table 2 we can see that on average, Leios leads to faster inference than 
both the Original (no approximations) and Naive (0.584s vs 2.797s and 0.894s, respec- 
tively). The Naive version was also faster than the original, giving more evidence that 
continuous models (even when just the observed variable is continualized) yield faster 
inference. 
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Fig. 7: Inference Times and Error ratios for Leios and Naive for different 6 


For accuracy, inference performed via Leios was on average more accurate than 
Naive (E = 0.079 vs. 0.098, respectively). Both were slightly less accurate than infer- 
ence performed on Original (Æ = 0.043). This is not unreasonable as Original has no 
approximations applied (which are the main source of inference error). However the 
Original failed on Election, SVE, and SVMfairness. For Election, a large Binomial 
latent led to a timeout, and it also slowed the Naive version relative to Leios (3.23s vs 
0.61s). The Original failed on SVE since it is a hybrid discrete-continuous model (which 
can make inference intractable [65,6]). SVMfairness is a non-linear model where many 
latent variables have high variances, leading to inference on the Original failing to con- 
verge; Leios and Naive had higher error on this benchmark, for much the same reason 
(though Leios was still significantly better than Naive, = 0.261 vs 0.454). 

Although Leios was faster than Original in all cases, for TrueSkill and SVMfairness, 
Leios was somewhat slower than Naive. This is likely because the discrete latent vari- 
ables in these benchmarks had small enough parameters (Binomial with small n). Sim- 
ilarly, for Fairness, Leios was slightly less accurate than Naive because the Gaussian 
approximation can be less accurate for smaller n. 


7.2  RQ2: Impact of Smoothing Factors 


Figure 7 presents the average inference times and ERs for different smoothing factors 
B. In both cases, X-axes represent smoothing factors. The Y-Axis of the left subfigure 
presents time, and Y-Axis of the right presents error ratio compared to the ground 
truth (less is better). 

Figure 7 (a) shows that Inference on the programs constructed by Leios is non- 
trivially faster than inference done on the naively smoothed version, regardless of the 
6 used (which has negligible affect on the inference time for the 8 we examined). 

Figure 7 (b) presents how accuracy directly depends on 3. The Error Ratio for Leios 
reaches a local minimum when ( = 0.1. Because Leios achieves “global” smoothing by 
approximating each latent, a larger value for 8 is not needed (unlike Naive). We also 
noticed for many benchmarks, smaller 8 led to better continuity correction parameters 
which also leads to better inference. Naive’s performance suffers for smaller 6, which 
we attribute to small 6 creating a highly multimodal observed variable distribution 
(also presented in Section 2) which hampers inference [37,59]. Consequently, Naive 
performs best when 8 = 0.5, however this 8 introduces non-trivially higher variance, 
which may often negatively affect the precision of inference. 
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Table 3: Variational Inference Times (s) and Error Ratios for selected 6 


B: 0.25 B:05 B:0.75 
Program Torg Eorg Tvs Ens Theios Exeios Treios ELeios TLeios Exeios 
GPAExample - - - - 3.111 0.207 3.341 0.241 3.435 0.321 
Election - - - - 1.762 0.070 1.755 0.110 1.764 0.064 
Fairness - - - - 1.813 0.722 1.827 0.769 1.830 0.753 
SVMfairness - - - - 1.800 0.201 1.806 0.293 1.804 0.301 
TrueSkill - - - - 1.809 0.119 1.802 0.062 1.790 0.090 
DiscreteDisease - - - - 1.734 0.248 1.731 0.471 1.747 0.553 
SVE 0.677 0.684 1.478 3.095 1.471 0.587 1.460 0.566 1.448 0.348 
BetaBinomial - - - - 1.605 0.834 1.596 0.708 1.587 0.497 
Exam - - - - 0.603 0.222 0.602 0.213 0.603 0.285 
Plankton - - - - 3.432 0.297 3.427 0.763 3.434 0.530 


7.3 RQ3: Extending Results to Other Systems 


Table 3 presents the results for running translated programs in Pyro. Columns 2-5 
present the inference times and result errors for the original and naively smoothed pro- 
gram. These columns are “-” when Pyro cannot successfully perform inference (i.e. the 
model contains a discrete variable that is unsupported by the auto guide). Columns 6-11 
present Leios’ time and error for each model, for three different smoothing parameters. 

Fully-automated Variational Inference failed on all but one of the examples for 
both the Original and Naive. This is because in both cases the program still contains 
latent or observed discrete random variables. For most of the benchmarks (Election, 
GPA, TrueSkill) the program optimized with Leios had errors comparable to those 
computed previously with MCMC in WebPPL. For some the error was over 0.5 for all 
8 (BetaBinomial, Fairness), which is in part a consequence of limitations of automatic 
VI, and hence for certain models manual fine-tuning may be unavoidable. These results 
illustrate that Leios can be used to create an efficient program in situations when the 
original language does not easily support non-continuous distributions. 


8 Related Work 


Probabilistic Program Synthesis To the best of our knowledge, we are the 
first to study program transformations that approximate discrete or hybrid discrete- 
continuous probabilistic programs with fully continuous ones to improve inference. 
Probabilistic program synthesis takes a more ambitious task of generating probabilis- 
tic programs with certain properties directly from data. For instance, Nori et al. [51] 
aim to synthesize a probabilistic program given a program sketch and a data-set to 
fit the program to. However, it merely fits the distribution parameters to the sketch. 
Furthermore their language lacks ‘==’ comparisons. Chasins et al. [11] takes a similar 
approach but only apply continuous approximations to already continuous variables. 


Probabilistic Inference with Discrete and Hybrid Distributions Re- 
cent work [65,66] has explored developing languages and semantics to encode discrete- 
continuous mixtures, however these all restrict the types of programs that can be 
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expressed and require specialized inference algorithms. In contrast, Leios can work 
with a variety of off-the-shelf inference algorithms that operate on arbitrary models 
and does not need to define its own inference algorithm. In [66] the authors explored 
a restricted programming language that can statically detect which parameters the 
program’s density is discontinuous in. However they did not address the question of 
continuous approximation, rather their approach was to develop a custom inference 
scheme and restrict the language so that pathological models cannot be written (they 
also disallow ‘==’ predicates). In [65], Wu et al. develop a custom inference method for 
discrete-continuous mixtures but only for models encodeable as a Bayesian network, 
furthermore as pointed out by [47], the specialized inference method of Wu et al. is 
restrictive since it cannot be composed with other program transformations. 

Additionally, Machine Learning researchers have developed other continuous relax- 
ation techniques to address the inherent problems of non-differentiable models. One 
other popular method is to reparametrize the gradient estimator during Variational 
Inference (VI) computation, commonly called the “reparameterization trick” [42,61]. 
However, this approach suffers from the fact that not all distributions support such 
gradient reparameterizations, and also this method is only limited to Variational In- 
ference. Conversely our approach allows one to still use any inference scheme. Further, 
even though these techniques have been attempted in the probabilistic programming 
setting, [40], such work still inherits the aforementioned weaknesses. 

We also draw upon Kernel Density Estimation (KDE) [62], a common approxima- 
tion scheme in statistics. KDE fits a Kernel density to each observed data point, hence 
constructing a smooth approximation. Naive Smoothing is essentially a KDE (with 
a Gaussian Kernel) of the original while Leios employs additional continualizations. 
Furthermore, our smoothing factor 8 is analogous to the bandwidth of a KDE. 


Program Analysis for Probabilistic Programs Multiple Program Analysis 
frameworks and systems have been developed for Probabilistic Programming [57, 33, 
63, 32, 22]. Additionally these analyses make use of a rich set of semantics [44, 36, 7, 
64,19], however of particular note is recent work by Lew et al. [41], which provides 
a type system for reasoning about variational approximations; however they focus on 
continuous approximations of already continuous variables. 


Benefits of Continuity in Conventional Programs The idea of smoothing 
and working with continuous functions in non-probabilistic programs has found success 
in a variety of applications [21, 12,34, 13]. Our work derives inspiration mainly from 
Smooth interpretation [14], which provides a semantics for smoothing deterministic 
programs encoding a discontinuous or discrete function. 


9 Conclusion 


We presented Leios as a method for approximating probabilistic programs with fully 
continuous versions. Our approach shows that by continualizing probabilistic programs, 
it is possible to achieve substantial speed-ups in inference performance whilst still 
preserving a high degree of accuracy. To this effect we combined two key techniques: 
statement level program transformations to continualize latent variables and a novel 
continuity correction synthesis procedure to correct branch conditions. 
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Abstract. We propose a denotational semantic framework for deter- 
ministic dataflow and stream processing that encompasses a variety of 
existing streaming models. Our proposal is based on the idea that data 
streams, stream transformations, and stream-processing programs should 
be classified using types. The type of a data stream is captured for- 
mally by a monoid, an algebraic structure with a distinguished binary 
operation and a unit. The elements of a monoid model the finite frag- 
ments of a stream, the binary operation represents the concatenation of 
stream fragments, and the unit is the empty fragment. Stream trans- 
formations are modeled using monotone functions on streams, which we 
call stream transductions. These functions can be implemented using 
abstract machines with a potentially infinite state space, which we call 
stream transducers. This abstract typed framework of stream transduc- 
tions and transducers can be used to (1) verify the correctness of stream- 
ing computations, that is, that an implementation adheres to the desired 
behavior, (2) prove the soundness of optimizing transformations, e.g. for 
parallelization and distribution, and (3) inform the design of program- 
ming models and query languages for stream processing. In particular, 
we show that several useful combinators can be supported by the full 
class of stream transductions and transducers: serial composition, paral- 
lel composition, and feedback composition. 


Keywords: Data streams - Denotational semantics - Type system 


1 Introduction 


Stream processing is the computational paradigm where the input is not pre- 
sented in its entirety at the beginning of the computation, but instead it is 
given in an incremental fashion as a potentially unbounded sequence of elements 
or data items. This paradigm is appropriate in settings where data is created 
continually in real-time and has to be processed immediately in order to ex- 
tract actionable insights and enable timely decision-making. Examples of such 
datasets are streams of business events in an enterprise setting 26), streams 
of packets that flow through computer networks (37], time-series data that is 
captured by sensors in healthcare applications [83], etc. 

Due to the great variety of streaming applications, there are various propos- 
als for specialized languages, compilers, and runtime systems that deal with the 
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processing of streaming data. Relational database systems and SQL-based lan- 
guages have been adapted to the streaming setting ; 
Recently, several systems have been developed for the distributed processing of 
data streams that are based on the distributed dataflow model of computa- 
tion (6l[7}[70} [86] [92} 94] [108}][112} [113]. Languages for detecting complex events 
in distributed systems, which draw on the theory of regular expressions and 
finite-state automata, have also been proposed 41][50])53}/88|/99|[111). The 
synchronous dataflow formalisms are based on Kahn’s 
seminal work [59], and they have been used for exposing and exploiting task- 
level and pipeline parallelism within streaming computations in the context 
of embedded systems. Several formalisms for the runtime verification of re- 
active systems have been proposed, many of which are based on variants of 


Temporal Logic and its timed/quantitative extensions (39][43][52|[74] [105]. Fi- 


nally, there is a large collection of languages and systems for reactive program- 
ming 3436]38]46]/47 55/68 ]69]77/89}93}/103} , which focus on the development. of 
event-driven and interactive applications such as GUIs and web programming. 

The aforementioned languages and systems have been successfully used in the 
application domains for which they were developed. However, each one of them 
typically introduces a unique variant of the streaming model in terms of: (1) the 
form of the input and output data, (2) the class of expressible stream-processing 
computations, and (3) the syntax employed to describe these computations. 
This has resulted in an enormous proliferation of semantic models for stream 
processing that are difficult to compare. For this reason, we are interested in 
identifying a semantic unification of several existing streaming models. 

This paper introduces a typed semantic framework for reasoning about 
languages and systems for stream processing. Three key questions are tackled: 
1. How do we model streams and what is the form of the data that they carry? 
2. How do we capture mathematically the notion of a stream transformation? 
3. What is a general programming model for specifying streaming computations? 
The first two questions concern the discovery of an appropriate denotational 
model for streaming computation. The third question concerns the design of 
programming and query languages, where a key requirement is that the behav- 
ior of a streaming program/query admits a precise mathematical description. 
Existing works have addressed these questions in the context of specific classes 
of applications. Here are examples of various perspectives: 

— Transductions of strings [5|[100}[104}[110]: A stream is viewed as an 
unbounded sequence of letters, and a stream transformation is a translation 
from input sequences to output sequences, which is typically called string/word 
transduction. These translations are commonly described using finite-state trans- 
ducers, a class of automata that extend acceptors with output. 

— The streaming dataflow model of Gilles Kahn [59}(60): The input 
and output consist of multiple independent channels that carry unbounded se- 
quences of elements. A transformation is a function from a tuple of input se- 
quences to a tuple of output sequences. Such transformations are specified with 
dataflow graphs whose nodes describe single-process computations. 
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— Relational transformations m: A stream is an unbounded multiset 
(bag) of tuples, and a stream transformation is a monotone operator (w.r.t. mul- 
tiset containment) on multisets. This can be generalized to consider more than 
one input stream. An interesting subclass of these operators can be described 
syntactically using monotone relational algebra. 

— Processing of time-varying relations [16}[17): A stream is a time- 
varying finite multiset of tuples, i.e. an unbounded sequence of finite multisets of 
tuples. In this setting, a stream transformation processes the input in a way that 
preserves the notion of time: after processing t input multisets (i.e., t time units) 
the output consists of t output multisets. The query language CQL defines 
a class of such computations that involve relational and windowing operators. 


— Transformations of continuous-time signals [27]: An input stream 
is a continuous-time signal, that is, a function from the real numbers R to an n- 
dimensional space R”. A stream transformation is a mapping from input signals 
to output signals that is causal, which means that the value of the output at time 
t depends on the values of input signal up to (and including) time t. Systems of 
differential equations can be used to describe classes of such transformations. 

We are interested here in a unifying framework that encompasses all the 
aforementioned concrete instances of streaming models and enables formal rea- 
soning about the composition of streaming computations from different models. 
In order to achieve this we take an abstract algebraic approach that retains 
only the essential aspects of stream processing without any unnecessary special- 
ization. The rest of the section outlines our proposal. 


At the most fundamental level, stream processing is computation over input 
that is not given at the beginning in full, but rather is presented incrementally 
as the computation evolves. Since the input is presented piece by piece, the basic 
concepts that need to be captured mathematically are: (1) what is a piece or 
fragment of the input, and (2) how do we extend the input. The most general 
class of algebraic structures that model these notions is the class of monoids, 
the collection of algebras that have a distinguished binary associative multi- 
plication operation - and an identity element 1 for this operation. A monoid 
(A,-,1) then constitutes a type of data streams, where the elements of the 
monoid are all the possible finite stream fragments, the identity 1 € A is the 
empty stream fragment, and the multiplication operation -: A x A — A models 
the concatenation of stream fragments. Using monoids, we can organize several 
notions of data streams using types that describe the form of the data, as well 
any invariants or assumptions about them. Monoids encompass the kinds of data 
streams that we mentioned earlier and many more: strings of letters, linear se- 
quences of data items, tuples of sequences, multisets (bags) of data items, sets 
of data items, time-varying relations/multisets, (potentially disordered) times- 
tamped sequences of data items, continuous-time signals, and so on. 

Stream transformations can be classified according to the type of their input 
and output streams, which we call a transduction type. They are modeled us- 
ing monotone functions that map an input stream history (i.e., the fragment of 
the input stream that has been received from the beginning of the computation 
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until now) to an output stream history (i.e., the fragment of the output stream 
produced so far). The monotonicity requirement captures the idea that a stream 
transformation cannot retract the output that has already been emitted. We 
call such functions stream transductions, and we propose them as a deno- 
tational semantic model for stream processing. This model encompasses string 
transductions, non-diverging Kahn-computable functions on streams, mono- 
tone relational transformations (71], the CQL-definable transformations on 
time-varying relations, and transformations of continuous-time signals 27]. 


We also introduce an abstract model of computation for stream processing. 
The considered programs or abstract machines are called stream transduc- 
ers, and they are organized using transducer types that specify the input and 
output stream types. A stream transducer processes the input stream in an in- 
cremental fashion, by consuming it fragment by fragment. The consumption of 
an input fragment results in the emission of an output fragment. Our algebraic 
setting brings in an unavoidable complication compared to the classical theory 
of word transducers: not all stream transducers describe a stream transduction. 
This phenomenon has to do with the generalization of the input and output data 
streams from sequences of atomic data items to elements of arbitrary monoids. 
A stream transducer has to respect its input/output type, which means that the 
way in which the input stream is fragmented into pieces and fed to the trans- 
ducer does not affect the cumulative output. More concisely, this says that the 
cumulative output is independent from the fragmentation of the input. In order 
to formalize this notion, we say that a factorization of an input history wu is a 
sequence of stream fragments U1, U2,...,Un whose concatenation is equal to the 
input history, i.e. uy-U2-++ Un = u. Now, the desired restriction can be described 
as follows: for every input history w and any two factorizations u1,..., um and 
V1,- --,Um of w, the cumulative output that the transducer emits when consum- 
ing the fragments u1, ..., Um in sequence is equal to the cumulative output when 
consuming the fragments v1,...,Un. Fortunately, this complex property can be 
distilled into an equivalent property on the structure of the stream transducer 
that we call coherence property. Every stream transducer that is coherent has 
a well-defined semantics or denotation in terms of a stream transduction. 


We have already outlined the basics of our general framework for streaming 
computation, which includes: (1) a classification of streams using monoids as 
types, (2) a denotational semantic model that employs monotone functions from 
input histories to output histories, and (3) a programming model that general- 
izes transducers to compute meaninfully on elements of arbitrary monoids. This 
already allows us to address important questions about specific computations: 


— Does a streaming program (transducer) behave as intended? This amounts 
to checking whether the denotation of the transducer is the desired function. 

— Are two streaming programs (transducers) equivalent? This means that their 
denotations in terms of stream transductions are the same. 

The first question is a correctness property. The second question is relevant for 


semantics-preserving program optimization. We will turn now to the issue of how 
to modularly specify complex stream transductions and transducers. 
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One of the most common ways to conceptually organize complex streaming 
computations is to view the overall computation as the composition of several 
processes that run independently and are connected with directed communi- 
cation channels on which streams of data flow. This way of structuring com- 
putations is called the dataflow programming model. The simple deterministic 
parallel model of Karp and Miller is one of the first variants of dataflow, 
and other notable early works on dataflow models include Dennis’s parallel lan- 
guage of actors and links and Kahn’s networks of computing stations and 
communication lines. We investigate three key dataflow combinators for com- 
posing stream transductions (i.e., semantic-level) and stream transducers (i.e., 
program-level): serial composition, parallel composition, and feedback com- 
position. Serial composition is useful for describing pipelines of processing stages, 
where the output of one stage is streamed as input into the next stage. Parallel 
composition describes the independent and concurrent computation of two or 
more components. Feedback composition supports computations whose current 
output depends on previously produced outputs. We show that our framework 
supports all these combinators, which facilitate the modular description of com- 
plex computations and expose pipeline and task-based parallelism. 


Outline of paper. In Sect. [2] we introduce the idea that data streams can be 
classified using monoids as their types, and in Sect. [3] we propose the semantic 
model of stream transductions. Sect. [4] is devoted to the description of an ab- 
stract model of streaming computation, called stream transducer, and the main 
properties that it satisfies. In Sect. [5] we show that our abstract model is closed 
under a fundamental set of dataflow combinators: serial, parallel, and feedback 
composition. In Sect. [6] we prove the soundness of a streaming optimizing trans- 
formation using denotational arguments and algebraic rewriting. Sect. [7|contains 
related work, and Sect. [8] concludes with a brief summary of our proposal. 


2 Monoids as Types for Streams 


Data streams are typically viewed as unbounded linear sequences of data items, 
where a data item can be thought of as a small indivisible piece of data. This 
viewpoint is sufficient for describing many useful semantic and programming 
models, but it is too concrete and unnecessarily restricts the notion of a data 
stream. In order to see this, consider a computation where the specific order in 
which the data items arrive is not relevant. Counting is a trivial example of such 
a computation, and it can be described operationally as follows: every time a 
new data item arrives, the counting stream algorithm emits the total number of 
items that have been seen so far. This can be described mathematically by the 
function 8, given by B((a1,%2,...,2%n)) = (1,2,...,n), where (21, 22,...,%n) 
is the input and (1,2,...,n) is the cumulative output of the computation. For 
this computation, the input can be meaningfully viewed as a multiset (or bag) 
instead of a sequence, since the ordering of the data items is irrelevant. This 
means that multisets can also be viewed as data streams, and in some cases this 
viewpoint is preferable to the traditional one of “streams = sequences”. 
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The example of the previous paragraph raises an obvious question: What 
class of mathematical objects can meaningfully serve as data streams? Linear 
sequences and multisets should certainly be included, but it would be desirable 
to generalize the notion of streams as much as possible. Recent works explore the 
idea of generalizing streams to encompass a large class of partial orders [13][85], 
but we will see later that this approach excludes many useful instances. Stream 
processing is the computational paradigm where the input is not presented in 
full at the beginning of the computation, but instead it is given in an incremental 
fashion or piece by piece. For this reason, there are just three notions that need 
to be modeled mathematically: (1) a fragment or piece of a data stream, (2) 
the extension of data with an additional fragment of data, and (3) the empty 
data stream, i.e. the data seen at the very beginning of the computation. This 
leads us to consider a kind or type of a data stream as an algebraic structure that 
satisfies the following: (1) its elements model data stream fragments, (2) it has a 
distinguished associative operation - for the concatenation of stream fragments, 
and (3) it has a distinguished element 1 that represents the empty fragment so 
that 1 is a unit for concatenation. The class of monoids is the largest class of 
algebraic structures that fulfill these requirements. 

More formally, a monoid is an algebraic structure (A,-,1), where: : Ax A> 
A is a binary operation called multiplication and 1 € A is a constant called unit, 
that satisfies the following two axioms: (I) (x-y)- z = x- (y- z) for all x,y,z € A, 
and (II) 1-2 =a-1=2 for all  € A. The first axiom says that - is associative, 
and the second axiom says that 1 is a left and right identity for the - operation. 
For brevity, we will sometimes write xy to denote x- y. 

Suppose that A is a monoid. We write A* for the set of all finite sequences of 
elements of A and e for the empty sequence. The finite multiplication function 
T: A* — A is given by z(e) = 1 and m(Z- (y)) = T(z) - y for z € A* and y € A. 
For sequences 7,y € A*, it holds that 7(Z- y) = m(Z)- (gy). So, m generalizes 
the binary multiplication - to a finite but arbitrary number of arguments. 

Let (A,-4, 14) and (B,-s,18) be monoids. Their product is the monoid (A x 
B,-,1), where the multiplication operation is given by (x,y) (a’,y’) = (£ -A 
w',y-py’) for «,2’ € A and y,y’ E€ B, and the identity is 1 = (14, 1B). 

A monoid homomorphism from a monoid (A,-,1) to a monoid (B,-,1) 
is a function h : A > B that commutes with the monoid operations, that is, 
h(1) = 1 and A(z - y) = h(x) - h(y) for all x,y € A. 

As we discussed earlier, we can think of a monoid as a type of data streams. 
The elements of the monoid represent finite stream fragments. The multiplication 
operation - models the concatenation of stream fragments, and the unit of the 
monoid is the empty stream fragment. 

For a monoid (A,-,1) we define the binary relation < as follows: for all 
x,y E€ A, we put x = y if and only if xz = y for some z € A. Since the relation 
=< is reflexive and transitive, we call it the prefix preorder for the monoid 
A. The unit 1 is a minimal element w.r.t. the x relation: 1-2 = x and hence 
1 x x for every x € A. Define the function prefix : A x A — P(A) as follows: 
prefix(x, y) = {z € A | az = y} for all x,y € A. This implies that z x y iff 
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prefix(x, y) # 0. In other words, prefix(z, y) is the set of all witnesses for x x y. 
A partial function 0: A x A — A is said to be a prefix witness function (or 
simply a witness function) for the monoid A if its domain is equal to x and it 
satisfies: O(a, y) € prefix(x, y) for every x,y E€ A with x = y. We can express this 
equivalently by requiring that the type of the function ð is Liey) cxprefix(x, y). 

We say that a monoid A satisfies the left cancellation property if ry = xz 
implies y = z for all x,y,z € A. In this case we say that A is left-cancellative. If 
A is left-cancellative, then it has a unique prefix witness function, because x = y 
implies that there is a unique z with xz = y. 


Example 1 (Finite Sequences). Consider the algebra (FSeq(A),-,<), where 
FSeq(A) is the set A* of all finite words (strings) over a set A, - is word concate- 
nation, and € is the empty word. This algebra is a monoid. In fact, it is the free 
monoid with generators A. For u,v € A*, u x v iff the word u is a prefix of the 
word v. There is a unique prefix witness function, because for every x,y € A* 
with x x y there is a unique z € A* such that xz = y. 


Let us consider now a variant of Example [I] in order to clear any misunder- 
standings regarding the < order. The set A*, together with the empty sequence 
g, and the operation o given by roy = yx is a monoid. For the monoid (A*,¢, 0), 
we have that x x y iff x o z = zx = y for some z € A*. So, x = y iff the word x 
is a suffix of the word y. 


Example 2 (Finite Multisets). Consider the algebra (FBag(A), U, Ø), where 
FBag(A) is the set of all finite multisets (bags) over a set A, U is multiset 
union, and is the empty multiset. This algebra is a monoid. In fact, it is 
the free commutative monoid with generators A. It is also left cancellative. For 
x,y € FBag(A), x = y iff x is contained in y. So, we also use the notation 
C instead of <. There is a unique prefix witness function, because for every 
x,y € FBag(A) with x C y there is a unique z € FBag(A) such that xz = y. 


Example 3 (Finite Sets). Let A be a set. Consider the algebra (FSet(A), U, 0), 
where FSet(A) is the set of all finite subsets of A, U is set union, and @ is the 
empty set. This algebra is a monoid. In fact, it is the free commutative idempotent 
monoid with generators A. For x,y € FBag(A), x = y iff x is contained in y. So, 
we also use the notation C instead of x. 

For x C y, define O(a,y) = y \ x, where \ is the set difference operation. 
Since x U (y \ x) = y for x C y, ô is a prefix witness function. We also define 
T(x,y) = y for x C y. Since x Uy = y for x C y, T is a prefix witness function. 
So, FSet(A) has several distinct prefix witness functions. 


Example 4 (Finite Maps). Let K be a set of keys, and V be a set of values. 
Consider the algebra (FMap(K, V),-,@), where FMap(K, V) is the set of all par- 
tial maps K — V with a finite domain, Ø is the partial map with empty domain, 
and - is defined as follows: 
g(k), if g(k) is defined 
(f-g)(k) = ¢ f(k), if g(k) is undefined and f(k) is defined 
undefined, otherwise 
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for every f,g € FMap(K,V) and k € K. We leave it to the reader to check that 
0-f=f-0=f and (f-g)-h=f-(g-h) for all f,g,h € FMap(K, V). So, the 
algebra FMap(K, V) is a monoid. 

Let f,g € FMap(A). We write dom(f) = {k € K | f(k) is defined} for the 
domain of f. It holds that dom(f - g) = dom(f) U dom(g). Using this property, 
we see that f x g iff dom(f) C dom(g). 

Let f,g € FMap(K,V) with f x g. Define 0(f,g) = g. Since dom(f) C 
dom(g), we have that f-O(f,g) = g. It follows that 0 is a prefix witness function. 
Define g \ f € FMap(K, V) as follows: 


g(k), if g(k) is defined and f(k) is undefined 


(9\ f)(K) = § g(k), if g(k), f(k) are defined and g(k) # f(k) 
undefined, otherwise 


for every k € K. From f = g we get f-(g\f) = g. So, \ is a prefix witness function. 
This means that FMap(K, V) has several distinct prefix witness functions. 


Example 5 (Bounded-Domain Continuous-Time Signals). Let A be an 
arbitrary set, and R be the set of real numbers. A bounded-domain continuous- 
time signal with values in A is a function f : [0,u) => A where u > 0 is a real 
number and [u,v) = {t E€ R| u<t < v}. We define the concatenation operation 
- for such signals as follows: 


f:[0,u) > A g : [0, v) > A 
f-g:[0,u+tv) 3A 


f(t), if t € [0, u) 
g(t—u), ifte [u,u+v) 


(F-E) -| 


We write BSig(A) for the set of all these bounded-domain continuous-time sig- 
nals. The unit signal is the unique function of type [0,0) + A, whose domain of 
definition is empty. Observe that BSig(A) is a monoid. For signals f : [0, u) > A 
and g : [0,v) —> A, it holds that f x g iff u < v and f(t) = g(t) for ev- 
ery t € [0,u). There is a unique prefix witness function, because for every 
f.g € BSig(A) with f = g there is a unique h € BSig(A) such that f-h =g. 


Example 6 (Timed Finite Sequences). We write N to denote the set of nat- 
ural numbers (non-negative integers). A timed sequence over A is an alternating 
sequence s0415142 . . - AnSn, Where s; € N and a; € A for every i. The occurrences 
So, $1,--. are called time punctuations and indicate the passage of time. So, the 
set of all timed sequences over A is equal to TFSeq(A) = N-(A-N)*. We define the 
fusion product © of timed sequences as follows: 594151 ...AmSmtobit, ...bntn = 
S040151 - - - Am(Sm + to)bity...batn. The unit timed sequence is the singleton se- 
quence 0. The algebra (TFSeq(A),o,0) is easily shown to be a monoid. There 
is a unique prefix witness function, because for all x,y € TFSeq(A) with z =< y 
there is a unique z € TFSeq(A) s.t. ro z= y. 


Example 7 (Finite Time-Varying Multisets). A finite time-varying mul- 
tiset over A is a partial function f : N — FBag(A) whose domain is equal to 
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[0..n] = {0,...,} for some integer n > 0. We also use the notation f : [0..n] > 
FBag(A) to convey this information regarding the domain of f. We define the 
concatenation operation - for finite time-varying multisets as follows: 


f : [0..m] > FBag(A) f(t), if t € [0..m — 1] 
g : [0..n] > FBag(A) (f-9)(t) = 4 FŒ Ug), ift=m 
f- g: [0..m +n] => FBag(A) gt—m), ifte€[m+1.n] 


We write TFBag(A) to denote the set of all finite time-varying multisets over A. 
The unit time-varying multiset Id : [0..0] + FBag(A) is given by Id(0) = 9. It is 
easy to see that f -Ild = f and that Id- f = f for every f : [0..n] + FBag(A). 
We leave it to the reader to also verify that (f-g)-h = f-(g-h) for finite 
time-varying multisets f, g and h. So, the set TFBag(A) together with - and Id 
is a monoid. It is not difficult to show that it is left-cancellative. 

Let us consider now the prefix preorder =< on finite time-varying multisets. 
For f : [0..m] > FBag(A) and g : [0..n] — FBag(A), it holds that f x g iff 
m < n and f(t) = g(t) for every t € [0..m]. 


The examples above highlight the variety of mathematical objects that can 
be meaningfully viewed as streams. These streams can be organized elegantly 
using the structure of monoids. The sequences of Example |1| the multisets of 
Example [} and the finite time-varying multisets of Example[7|can be described 
equivalently in terms of the partial orders of [13][85], which have also been sug- 
gested as an approach to unify notions of streams. Using partial orders it is 
also possible to model the timed finite sequences of Example [6] but only with a 
non-succinct encoding: every time punctuation t € N is encoded with a sequence 
11...1 of t punctuations, one for each time unit. Partial orders cannot encode 
the sets of Example [8] the maps of Example [4] or the signals of Example [J] In- 
formally, the reason for this is that partial orders can only encode commutation 
equations, which are insufficient for objects such as sets and maps. 


3 Stream Transductions 


In this section we will introduce stream transductions as semantic denotational 
models of stream transformations. At any given point in a streaming computa- 
tion, we have seen an input history (the part of the stream from the beginning 
of the computation until now) and we have produced an output history (the 
cumulative output that has been emitted from the beginning until now). As a 
first approximation, a streaming computation can be described mathematically 
by a function 6 : A > B, where A and B are monoids that describe the input 
and output type respectively, which maps an input history x € A to an output 
history (x) € B. The function 6 has to be monotone because the output is 
cumulative, which means that it can only be extended with more output items 
as the computation proceeds. An equivalent way to understand the monotonicity 
property is that it captures the idea that any output that has already been emit- 
ted cannot be retracted. Since § takes an entire input history as its argument, 
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it can describe stateful computations, where the output that is emitted at every 
step potentially depends on the entire input history. 


Definition 8 (Stream Transduction & Incremental Form). Let A and B 
be monoids. A function 3: A — B is said to be monotone (with respect to the 
prefix preorder) if  < y implies G(x) = 8(y) for all x,y € A. For a monotone 
8: A > B, we say that the partial function p is a monotonicity witness function 
if it maps elements x,y € A and z € prefix(x,y) witnessing that x =< y toa 
witness u(x, y,z) € prefix(G(x), B(y)) for B(x) x B(y). That is, we require that 
the type of pu is Te yeaPrefix(2, y) > prefix(8 (x), B(y)). So, the defining property 
of u is that for all x,y,z € A with xz = y it holds that G(x) - u(x, y, z) = By). 
For brevity, we will sometimes write p(x, z) to denote p(x, xz, z). The defining 
property of u is then written as G(x) - u(x, z) = B(xz) for all z,z € A. 

A stream transduction from A to B is a function 8 : A > B that is mono- 
tone with respect to the prefix preorder, together with a monotonicity witness 
function u : Tc 4prefix(z, y) — prefix(8 (x), 8(y)). We write STrans(A, B) to 
denote the set of all stream transductions from A to B. 

The incremental form of a stream transduction (8, 4) € STrans(A, B) is a 
function F(8, u) : A* + B*, which is defined inductively by F(8, )(€) = (8(1)) 
and F(B, u) ((z1, -eo Ün, En+1)) = F(6, u) ((z1, Hai Zn) ) ` (u(x _~ Tn, En+1)) for 
every sequence (z1, ..., n41) E A*. 


Consider the stream transduction (8, u) : STrans(A, B) and the input frag- 
ments x,y E€ A. Notice that u(x, y) gives the output increment that the streaming 
computation generates when the input history x is extended into xy. For an ar- 
bitrary output monoid B, the output increment p(x, y) is generally not uniquely 
determined by (x) and (xy). This means that the monotonicity witness func- 
tion u generally provides some additional information about the streaming com- 
putation that cannot be obtained purely from 8. However, if the output monoid 
B is left-cancellative then there is a unique function u that witnesses the mono- 
tonicity of 8. 

Suppose that (3, u) : STrans(A, B) is a stream transduction. The incremental 
form F(8, u) of the transduction (8, u) describes the stream transformation in 
explicit input/output increments. For example, F(8, u)((x1)}) = (8(1), u(1, 21)) 
and F(6, 4)((a1,%2)) = (6(1), u(1, £1), (a1, £2)}. The key property of the in- 
cremental form is that m(F(G,u)(Z)) = B(r(z)) for every z € A*. For example, 
m(F(B, 1) ((x1, £2, £3))) = LO) UC, £1) H(£1, £2) (£122, 73) = B(r1)-w(21, £2): 
u(z1z2, £3) = (z122) + (£122, 23) = B(a x23). 


Example 9 (Counting). Let A be an arbitrary set. We will describe a stream- 
ing computation whose input type is the monoid FBag(A) and whose output 
type is the monoid FSeq(N). The informal operational description is as follows: 
there is no initial output, and every time a new data item arrives the compu- 
tation emits the total number of items seen so far. The formal description is 
given by the stream transduction 6 : FBag(A) — FSeq(N), defined by 8(0) = € 
and (x) = (1,2,...,|x]) for every non-empty x € FBag(A), where |z| denotes 
the size of the multiset x. It is easy to see that 3 is monotone. Since FSeq(N) 
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is left-cancellative, the monotonicity witness function is uniquely determined: 
p(z, 0) =e and (x,y) = (|x| +1,...,|2| + yl) when y 4 0. 


Example 10 (Per-Key Aggregation). Let K be a set of keys, and V be 
a set of values. The elements of K x V are typically called key-value pairs. 
Suppose that op: V x V > V is an associative and commutative operation. So, 
op can be generalized to an aggregation operation that takes non-empty finite 
multisets over V as input. We will describe a streaming computation whose 
input type is the monoid FBag(K x V) and whose output type is the monoid 
FMap(K, V). Informally, every time an item (k,v) is processed, the output map 
is updated so that the k-indexed entry contains the aggregate (using op) of all 
values seen so far for the key k. The formal description of this computation is 
given by the stream transduction 8 : FBag(k x V) + FMap(K,V), defined by 
B(x) = {k > op(a2|x) | k appears in x} for every multiset x, where x|, denotes 
the multiset that results from x by keeping only the pairs whose key is equal to 
k. That is, the domain of 8(x) is equal to dom(6(#)) = {k € K | k appears in x} 
and 3(a)(k) = op(a|;) for every k that appears in x. The monotonicity witness 
function p is defined as follows: (x,y) is equal to the restriction of the map 
B(x U y) to the set of all keys that appear in y. 


We saw in Sect. |2| that we can form products of monoids: if A and B are 
monoids, then so is A x B. Intuitively, we can think of A x B as the data 
stream type that involves two parallel and independent channels: one channel 
for streams of type A and another channel for streams of type B. 


Example 11 (Merging of Multiple Input Channels). Given a set A, we 
want to describe a transformation with two input channels of type FBag(A) and 
one output channel of type FBag(A). The monotone function 6 : FBag(A) x 
FBag(A) > FBag(A), given by 8(a,y) = x U y for multisets x and y, describes 
the merging of the two input substreams. Operationally, whenever a new data 
item arrives (regardless of channel) it is propagated to the output channel. Since 
FBag(A) is left-cancellative, the monotonicity witness function is uniquely deter- 
mined: u((x1, y1), (£2, y2)) = (2 U y2) \ (z1 U y1) for all x1, y1, £2, Y2 E FBag(A). 


Example 12 (Flatten). Let A be a monoid. The function 8 : FSeq(A) > A, 
given by (z) = m(z) for every z € FSeq(A), describes the flattening of a se- 
quence of monoid elements. The function 6 is monotone, and its monotonicity 
witness function p is given by a(z, Y) = 7(y) for all z and y. The stream trans- 
duction flatten( A) = (8, u) has type STrans(FSeq(A), A). 


Example 13 (Split in Batches). Let X = {a,b} be an alphabet of sym- 
bols. Suppose that we want to describe the decomposition of an element of 
X* into batches of size exactly 3. We describe this using two functions rı : 
X* — FSeq(X*) and rə : X* > X*. Informally, rı gives the sequence of full 
batches of size 3, and r2 gives the remaining incomplete batch. For example, 
rı (abbaabba) = (abb, aab) and r2(abbaabba) = ba. 

This idea of splitting in batches can be generalized from the monoid X* to 
an arbitrary monoid A. We say that a splitter for A is a pair r = (r1,r2) of 
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functions rı : A — FSeq(A) and rg : A > A satisfying the following prop- 
erties: (1) the equality x = m(ri(x)) - ra(x) says that rı and r2 decompose 
x € A, (2) ri(14) = £ says that the unit cannot be decomposed, (3) rı(x - y) = 
ry(@) - rı(ra(x) + y) and (4) ra(x - y) = re(re(x)- y) describe how to decom- 
pose the concatenation of two monoid elements. The first two properties im- 
ply that rə(14) = 14. The third property implies that rı is monotone. Define 
p(x, y) = 11 (r2(x)-y) for x,y € A and observe that rı (x)- u(x, y) = ri (ay). It fol- 
lows that split(r) = (ri, ) is a stream transduction of type STrans(A, FSeq(A)). 


Our denotational model of a stream transformation uses a monotone function 
whose domain is the monoid of (finite) input histories. We emphasize that such 
a denotation can also describe the transformation of an infinite stream. To il- 
lustrate this point in simple terms, consider a monotone function 6 : A* > B*, 
where A (resp., B) is the type of input (resp., output) items. This function ex- 
tends uniquely to the w-continuous function 8% : A® — B®, where A® = A*U 
A” is the set of finite and infinite sequences over A, as follows: 8% (apaia...) 
is equal to the supremum of the chain (€) < 6(ao) < (aoa) < 


4 Model of Computation 


We will present an abstract model of computation for stream processing, where 
the input and output data streams are elements of monoids A and B respec- 
tively. A streaming algorithm is described by a transducer, a kind of automaton 
that produces output values. We consider transducers that can have a poten- 
tially infinite state space, which we denote by St. The computation starts at a 
distinguished initial state init € St, and the initialization triggers some initial 
output o € B. The computation then proceeds by consuming the input stream 
incrementally, i.e. fragment by fragment. One step of the computation from a 
state s € St involves consuming an input fragment x € A, producing an output 
increment out(s, x) € B and transitioning to the next state next(s,x) € St. 


Definition 14 (Stream Transducer). Let A, B be monoids. A stream trans- 
ducer with inputs from A and outputs from B is a tuple G = (St, init, 0, next, out), 
where St is a nonempty set of states, init € St is the initial state, o € B is the ini- 
tial output, next : St x A —> St is the transition function, and out : St x A > B is 
the output function. We write G(A, B) to denote the set of all stream transducers 
with inputs from A and outputs from B. 

We define the generalized transition function gnext : St x A* — St by in- 
duction: gnext(s,¢) = s and gnext(s, (x) - Y) = gnext(next(s, a), Y) for all s € St, 
x € A and y € A*. A state s € St is said to be reachable in G if there exists a 
sequence Z € A* such that gnext(init,Z) = s. 

We define the generalized output function gout : St x A* — B by induc- 
tion on the second argument: gout(s,¢) = 1 and gout(s, (x) - Y) = out(s, x) - 
gout(next(s,x), J) for all s € St, x € A and J € A*. The extended output func- 
tion eout : St x A* — B* is defined similarly: eout(s,¢) = £ and eout(z, (xz) -y) = 
(out(s, x)) - eout(next(s, x), J) for all s € St, x € A and ye A*. 
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Example 15 (Transducer for Counting). Recall the counting streaming 
computation that was described in Example [| We will describe a stream trans- 
ducer that implements the counting computation. The input monoid is FBag(A) 
and the output monoid is FSeq(N). The state space is St = N, because the 
transducer has to maintain a counter that remembers the number of data items 
seen so far. The initial state is init = 0 and the initial output is o = e. The 
transition function increments the counter, i.e. next(s,2) = s + |x| for every 
s € St and x € FBag(A). The output function is defined by out(s,@) = £ and 
out(s,z) = (s +1,...,s + |z|) for a nonempty multiset x. The type of this 
transducer is G(FBag(A), FSeq(N)). 


Example 16 (Transducer for Merging). We will implement the merging 
computation of Example[1]] where there are two input channels of type FBag(A) 
and one output channel of type FBag(A). The transducer does not need mem- 
ory, so St = Unit, where Unit = {x} is a singleton set. The initial state is 
init = x and the initial output is o = Ø. There is only one possibility for the 
transition function: next(s, (x, y)) = x. The output function describes the prop- 
agation of the input increments of both input channels to the output chan- 
nel: out(s, (x, y)) = x Uy for all multisets x,y. The type of this transducer is 
G(FBag(A) x FBag(A), FBag(A)). 


Example 17 (Flatten). For a monoid A, we define a transducer Flatten(A) = 
(St, init, o, next, out) : G(FSeq(A), A) that implements the flattening transduc- 
tion of Example This computation does not require memory, so we define 
St = Unit and init = x. The initial output is o = 14, the transition function 
is uniquely determined by next(s, x) = x, and the output function is given by 
out(s, (a1,..-,@n)) = a1 `+: Gn. 


Example 18 (Split in Batches). For a monoid A and a splitter r = (r1, r2) for 
A (Example [13}, we describe a transducer Split(r) = (St, init, o, next, out) that 
implements the transduction split(r) : STrans(A, FSeq(A)). We define St = A, 
because the transducer needs to remember the remainder of the cumulative 
input that does not yet form a complete batch, and init = 14. The initial output 
o = € is the empty sequence. The transition and output functions are defined by 
next(s, £) = ro(s- x) and out(s, x) = rı(s - x). 


Definition [I4] does not capture a key requirement for streaming computations 
over monoids, namely that the cumulative output of a transducer G should be 
independent of the particular way in which the input history is split into the 
fragments that are fed to it. More precisely, suppose that w is an input history 
that can be fragmented (factorized) in two different ways: w = uy - U2: Um 
and w = v1: V2++-Un. Then, the cumulative output of the transducer G when 


consuming the sequence of fragments (factorization) u1, u2,..., Um should be 
equal to the cumulative output when consuming v1, v2,...,Un-. In Definition [20] 


below, we formulate a set of coherence conditions that a transducer must adhere 
to in order to satisfy this “factorization independence” requirement. 
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Definition 19 (Bisimulation & Bisimilarity). Let G = (St, init, o, next, out) 
be a transducer with inputs from A and outputs from B. A relation R C St x St 
is a bisimulation for G if for every s,t € St and x € A we have that (s,t) € R 
implies out(s, x) = out(t, x) and (next(s, x), next(t, x)) € R. We will also use the 
notation sRt to mean (s,t) € R. We say that the states s,t € R are bisimilar, 
denoted s ~ t, if there exists a bisimulation R for G such that sRt. The relation 
~ is called the bisimilarity relation for G. 


It is well-known that the bisimilarity relation for G is an equivalence relation 
(reflexive, symmetric, and transitive), and for all s,t € St and a € A it satisfies 
the following extension property: s ~ t implies that next(s, x) ~ next(t, x). It 
can then be easily seen that the bisimilarity relation is a bisimulation. In fact, 
it is the largest bisimulation for the transducer G. 


Definition 20 (Coherence). Suppose G = (St, init, o, next, out) : G(A, B) is a 
stream transducer. We say that G is coherent if it satisfies the following: 

(N1) next(init, 1) ~ init. 

(N2) next(init, cy) ~ next(next(init, x), y) for every x,y € A. 

(O1) o- out(init, 1) = o. 

(02) o- out(init, cy) = o - out(init, x) - out(next(init, x), y) for every x,y € A. 


The coherence conditions of Definition [20] capture the idea that the trans- 
ducer behaves in “essentially the same way” regardless of how the input is split 
into fragments. For example, the condition (N2) says that the two-step transi- 
tion init >” sı >” s2 and the single-step transition init =” tı end up in states 
(s2 and tı) that will have exactly the same behavior in the subsequent compu- 
tation. In other words, it does not matter whether the input xy was fed to the 
transducer as a single fragment xy or as a sequence of two fragments (x, y). 

Let (A,-,1) be a monoid. A factorization of an element x € A is a sequence 
X1,---,%, Of elements of A such that x = x,---2,. In particular, the empty 
sequence £ € A* is a factorization of 1. In other words, z € A* is a factorization 
of x € A if r(Z) = 2. 


Theorem 21 (Factorization Independence). Let G = (St, init, o, next, out) 
be a stream transducer of type G(A, B). If G is coherent, then for every z € A 
and every factorization Z € A* of x we have that o- gout(init, Z) = o- out(init, x). 


Proof. For clarity, we write (£1, £2,..., £n} € A* to denote a finite sequence of 
elements of A. The following properties hold for all s € St, z € A* and y € A: 


gnext(s, Z- (y)) = next(gnext(s, z), y) (1) 
gout(s,Z- (y)) = gout(s, Z) - out(gnext(s, z), y) (2) 
eout(s,Z- (y)) = eout(s, Z) - (out(gnext(s, z), y)) (3) 


Each property shown above can be proved by induction on the sequence Z. 
Consider an arbitrary coherent stream transducer G = (St, init, o, next, out). 
We claim that G satisfies the following coherence property: 


gnext(init, (v1,...,2n)) ~ next(init, 21 +- 8n) for all (21,...,2,) E€ A*. (N*) 
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The proof is by induction on the length of the sequence. For the base case, we 
have that gnext(init, £) = init and next(init, 1) are bisimilar because G is coherent 
(recall Property (N1) of Definition [20}. For the induction step we have: 


gnext(init, Z - (y)) = next(gnext(init, z), y) [Equation (2p) 
~ ie A m(£)),Y) (I.H., extension] 
~ next(init, 7(Z) - y), [coherence (N2)] 


which is equal to next(init, 7(Z-(y))). This concludes the proof of the claim (N*}. 

The proof of the theorem proceeds by induction on % € A*. For the base case, 
observe that o- gout(init,e) = o- 1 = o is equal to o- out(init, 1) = o (property 
(O1) for G). For the induction step, we have: 


o - gout(init, z - (y)) = o - gout(init, z) - out(gnext(init, z), y) [Eq. p) 
= 0- out(init, 7(Z)) - out(gnext(init, z), y) (I.H.] 
= o: out(init, 7(Z)) - out(next(init, 7(Z)),y) [Prop. (N>) 
= o- out(init, 7(Z) - y) [Prop. (O2)| 


which is equal to o- out(init, 7(Z - (y))). 


Theorem [21|says that the condition of coherence guarantees a basic correct- 
ness property for stream transducers: the output that they produce does not 
depend on the specific way in which the input was partitioned into fragments. 

For a transducer G = (St, init, o, next, out) we define the function [G] : A* > 
B* as follows: [G](z) = (0) - eout(init,Z) for every z € A*. We call [G] the 
interpretation or denotation of G. The definition of [G] implies that [G] (e) = 
(o) and the following holds for every z € A* and y € A: 


[F](@ - (y)) = [9] (2) - (out(gnext(init, 2), y)) (4) 


When G is coherent, Theorem [21|says that the denotation gives the same cumu- 
lative output for any two factorizations of the input. We say that the transducers 
Gı and Gə are equivalent if their denotations are equal, i.e. [Gi] = [G2]. 


Definition 22 (The Implementation Relation). Let A,B be monoids, G : 
G(A, B) be a stream transducer, and (8, u) : STrans(A, B) be a stream transduc- 
tion. We say that G implements (B, u) if [G](Z) = F(G, p) (Z) for every z € A*. 


Theorem 23 (Implementation & Coherence). A stream transducer G : 
G(A, B) is coherent if and only if it implements some stream transduction. 


Proof. Suppose that G = (St, init, o, next, out) : G(A, B) is a coherent transducer. 
Define the function 6 : A > B by (a) = o- out(init, x) for every x € A, and 
the function u : Ax A > B by p(x,y) = out(next(init, x), y) for all x,y € 
A. For any x,y E€ A, we have to establish that G(x) - u(a,y) = B(xy). This 
follows immediately from Part (02) of the coherence property for G. So, (8, p) 
is a stream transduction. It remains to prove that G implements (3, u), that is, 
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[G]() = F(8, )(#) for every z € A*. For the base case, we have [G] (e) = (o) 
and F(8, w)(€) = (6(1)), which are equal because 3(1) = o- out(init, 1) = o by 
(O1). For the step case, we observe that: 


[G] (€ - (y)) = [G] (2) - (out(gnext(init, z), y)) [Equation (4)] 
F(B, w)(@ - (y)) = FB, w)(@) - (u(r (2), y)) [def. of F(B, 4)] 


By the induction hypothesis, it suffices to show that out(gnext(init,Z),y) is 
equal to (n(x), y) = out(next(init, 7(Z)),y). This follows from the fact that 
gnext(init, Z) and next(init, 7(Z)) are bisimilar, see Property (N*}. 

For the converse, suppose that G = (St, init, o, next, out) : G(A, B) is a trans- 
ducer that implements (8, u) : STrans(A, B). Define the relation R as: 


eB 


R = {(s,t) € St x St | there are Z,9 € A* with 1(Z) = (9) s.t. 
s = gnext(init, Z) and t = gnext(init, 7)}. 


We claim that R is a bisimulation. Consider arbitrary states s,t € St with 
sRt and z € A. It follows that there are %,y € A* with 1(Z) = ma ) such that 
s = gnext(init, Z) and t = gnext(init, y). We have to show that out(s, z) = out(t, z) 
and next(s, z) Rnext(t, z). First, notice that: 


[9] (z - (z)) = [G] (z) - (out(s, z)) [Equation (4), def. of s] 
F(B, w)(@- (2)) = F(B, w)(@) - (u((®),2)) (def. of F(8, u)] 


Since G implements (3, u), we have that [G](Z- (z)) = F(8, 4)(Z- (z)) and there- 
fore out(s,z) = u(n(z), z). Similarly, we can obtain that out(t, z) = u(r (9), z). 
From 7(Z) = r(Y) we get that pu(7(%), z) = p(7(¥), z), and therefore out(s, z) = 
out(t, z). Now, observe that s’ = next(s,z) = next(gnext(init, Z),z) = gnext(z 
(z)) using Property [I] Similarly, we have that t = next(t,z) = gnext(j- (z)). 
From a(z- (z)) = 1(Z)z = t(y)z = m(¥- (z)) we conclude that s’ Rt’. We have 
thus established that R is a bisimulation. 

Now, we are ready to prove that G is coherent. We will only present the cases 
of Part (N2) and Part (O2), since they are the most interesting ones. Let x,y € A. 
For Part (N2), we have to show that the states s = next(next(init, x), y) and 
t = next(init, zy) are bisimilar. Since R (previous paragraph) is a bisimulation, it 
suffices to show that (s, t) € R. Indeed, this is true because s = gnext(init, (x, y)), 
t = gnext(init, (xy}) and m((x, y)) = xy = m((ry)). For Part (02), we have that 
[G]((cy)) = (o, out(init, xy) and F(, n)((xy)) = (801), (1, ry), as well as 


[G] (x, y)) = (0, out(init, x), out(next(init, x), y)) and 


F(B, w)((v,y)) = (80), (1, £), (2, y)), 


using the definitions of |G] and F. Since G implements (3,4), we know that 


[G], y)) = F(G, w) (a, y)) and [G]((xy)) = F(8, 4) ((ay)). Using all the above, 
we get that o- out(init, x) - out(next(init, £), y) = (1) - u(1, x) - u(x, y) = B(x) - 
p(x, y) = B(xy) and o- out(init, cy) = B(1)- u(1, ry) = B(xy). So, Part (02) of 
the coherence property holds. 
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Theorem|23] provides justification for our definition of the coherence property 
for stream transducers (recall Definition [20}. It says that the definition is ex- 
actly appropriate, because it is a necessary and sufficient condition for a stream 
transducer to have a stream transduction as its denotation. In other words, the 
coherence property characterizes the transducers have a well-defined denota- 
tional semantics in terms of transductions. It offers this guarantee of correctness 
without limiting their expressive power as implementations of transductions. 


Theorem 24 (Expressive Completeness). Let A and B be monoids, and 
(8, u) be a stream transduction in STrans(A, B). There exists a coherent stream 
transducer that implements (8, p). 


Proof. Recall from Definition|8]that the monotonicity witness function p satisfies 
the following property: B(x) u(x, y) = B(ay) for every x,y € A. Now, we define 
the transducer G = (St, init, o, next, out) as follows: St = A, init = 1, o = 8(1), 
next(s, x) = s - x and out(s,2) = u(s, x) for every state s € St and input x € A. 
The following properties hold for every s € St and (£1,..., £n) € A*: 


gnext(s, (£1,...,n)}) = S: Xy-++e, and (5) 
(0) - eout(init, (x1, ..-,En)) = F(8, )((£1,---,2n)) (6) 
Both these properties are shown by induction on the sequence (z£1,..., Zn). It 


follows that [G](Z) = (o) - eout(init,z) = F(8,)(Z) for every z € A*. So, G 
implements the transduction (8, u). Finally, G is coherent by Theorem [23] o 


Theorem [24] assures us that the abstract computational model of coherent 
stream transducers is expressive enough to implement any stream transduction. 
For this reason, we will be using stream transducers as the basic programming 
model for describing streaming computations. 


Example 25 (Correctness of Flatten). Using induction, we will show that 
the transducer G = Flatten(A) = (Unit, x, 14, next, out) implements the trans- 
duction (7, u) = flatten(A) for a monoid A (recall Examples |12| and [17}. We 
show by induction that [G](Z) = F(z, )(Z) for every z € FSeq(A)*. For the 
base case, we have that [G] (e) = (14) and F(z, u)(£) = (a(e)) = (1a). Now, 


[9] (z - (y)) = [9] (£) - (out(gnext(init, z), y)) [def. of [9] 
= F(x, p) (T) - ((y)) (I.H. and def. of out] 
= F(7, u) (z) - (u(r (z), y)) [def. of u] 
= F(x, u) (Tī - (y)) (def. of F] 


for all Z € FSeq(A)* and y € FSeq(A). We have thus proved that Flatten(A) is 
correct: its denotation is equal to the intended semantics. 


Example 26 (Correctness of Split). We will establish that the transducer for 
splitting in batches is correct, namely that G = Split(r) = (A, 14, £, next, out) 
implements (r1, y) = split(r) for a splitter r = (71,12) for the monoid A (recall 
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Examples and [18). Using the properties of splitters and an argument by 
induction, we obtain that gnext(init,Z) = ro(7(Z)) for every z € A*. We show 
by induction that [G](Z) = F(r1,u)(Z) for every z € A*. For the base case, we 
have that [G] (£) = (e) and F(ri, u)(€) = (ri(14)) = (e). Now, 


[GIC - (y)) = [G] (2) - (out(gnext(init,z),y)) [Equation (4)] 
= F(r1, w)(Z) - (out(ra(7(Z)),y)) [ILH. and previous claim] 
= F(r1, w)(Z) - (ri(ra(m(Z)) - y)) (def. of out] 
= F(ri, u)(Z) - (u(r (z), y)) [def. of u] 
= F(ri, u)(Z- (y)) [def. of F] 


for all z € A* and y € A. We have thus established that Split(r) is correct: its 
denotation is equal to the intended semantics. 


5 Combinators for Deterministic Dataflow 


We consider four dataflow combinators: (1) the lifting of pure morphisms to 
streaming computations, (2) serial composition for exposing pipeline parallelism, 
(3) parallel composition for exposing task-based parallelism, and (4) feedback 
composition for describing computations whose current output depends on pre- 
viously produced output. The combinators are defined both for stream transduc- 
tions (semantic objects) and for stream transducers (programs). Table [1] shows 
the definitions. The lifting of pure morphisms is implemented with a stateless 
transducer (i.e., the state space is a singleton set). Both parallel and serial com- 
position are implemented using a product construction on transducers. In the 
case of parallel composition, each component computes independently. In the 
case of serial composition, the output of the first component is passed as input 
to the second component. In the case of feedback composition, the computation 
proceeds in well-defined rounds in order to prevent divergence. 

We prove a precise correspondence between the semantics-level and program- 
level combinators for all cases: lifting (Proposition 27), parallel composition 
aas T the composition (Proposition |29), and feedback composition 
(Proposition |30). These are essentially correctness properties for the imple- 
mentations of the combinators Lift, Par, Serial, Loop. They establish that our 
typed framework is appropriate for the modular specification of complex stream- 
ing computations, as it can support composition constructs that are essential for 
parallelization and distribution. 


Proposition 27 (Lifting). Let h : A > B be a monoid homomorphism. Then, 
Lift(h) is a coherent transducer and it implements the transduction lift(A). 


Proposition 28 (Parallel Composition). Let Ai, A2, Bi, Bz be monoids, 

(81,41) : STrans(A;, By) and (b2, u2) : STrans( A2, B2) be transductions, and 

Gı : G(Ai, Bi) and Gə : G(A2, B2) be transducers. 

(1) IMPLEMENTATION: If G, implements (61, 41) and G2 implements (62, u2) 
then Par(G,,G2) implements (61, p1) || (G2, H2). 
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Table 1. Combinators for deterministic dataflow. 


Lifting of monoid homomorphisms 


monoid homomorphism h : A > B B(x) = h(x) 
lift(h) = (8, u) : STrans(A, B) w(x, y) = hy) 
Lift(h) = (St, init, o, next, out) init = x next(s, xz) = s 
St = Unit o= h(1) out(s, x) = h(x) 


Parallel composition 


(B1, m1) : STrans(Aı, Bi) (B2, u2) $ STrans( A2, B2) 
(B1, H1) || (82, u2) = (8, u) : STrans(Aı x A2, Bi x Bo) 


p((£1, £2)) = (b1 (21), B2(£2))  p((£1, £2), (yr, Y2)) = (u1 (£1, Y1), p2 (x2, Y2)) 


Gi = (Stı, init;, 01, next: , out: ) init = (initi, init2) 
G2 = (St2, init2, 02, nextz, out2) = (01,02) 

Par(Gi, G2) = (St, init, o, next, out) next( (sı, s2), (a, na = (next: (s1, a), next2(s2, c)) 
St = Stı x St2 out((s1, s2), (a, c)) = (out: (s1, a), oute(s2, c)) 


Serial composition 


(Bı, pı) : STrans(A, B) (B2, u2) : STrans(B, C) B(x) = b2(81(x)) 
(b1; p1) > (Be, u2) = (8, u) : STrans(A, C) p(z, y) = p2(b1 (£), 41 (£, y)) 
A = (Stı, init1, 01, next, outı) o = o2 - out2(init2, 01) 
= (Sto, init2, 02, next2, out2) next((s1, s2), a) = (nexti(s1, a), 
Serial(G1, =- (Stı x St2, init, o, next, out) next2(s2, out: (s1, @))) 
init = (init; , next2(init2, o1)) out((s1, 2),@) = oute(s2, out; (s1,a)) 


Feedback composition 


(8, u) : STrans(A x B, B) 
loopB(B, u) = (y, v) : STrans(FSeq(A), FSeq(B)) 
yl (aiy cesan?) = (bo, b1,..., bn) 
q(e) = (bo), where bo = 8(14, 1B) 
y(lar,..., an, an+1)) = Y((a1,---,@n)) + (bn41), where 
bn+1 = u (a1: +- an, bobi ++» bn—1), (@n41, bn)) 
G = (St, init, o, next, out) : G(A x B, B) 
LoopB(G) = (St’, init’, o', next’, out’) : G(FSeq(A), FSeq(B)) 
St’ =St x B (second component: last output batch) 


init’ = (init,o) and o’ = (o) 
b),@) = (next(s, (a, b)), out(s, (a, b))) 
out’ ((s, b),a) = (out(s, (a, b))) 
(8, u) : STrans(A x B, B) splitter r for A 
loop(ß, u, r) = split(r) > loopB (B, u) > flatten(B) : STrans(A, B) 
G : G(A x B, B) splitter r for A 
Loop(G,r) = Serial(Split(r), LoopB(G), Flatten(B)) : G(A, B) 


next’ ((s, 
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(2) COHERENCE: If G, and Go are coherent, then so is Par(G1, G2). 


Proof. Notice that Part (2) follows immediately from Part (1) and Theorem [23] 
Define f = [Par(Gi,G2)] and (6,4) = (61,41) || (G2, 42). We will show that 
f(w) = F(6, p) (Ñ) for every © € (A, x A2)”. Suppose that fst is the (elementwise) 
left projection function. We claim that fst(gnext(s,i&)) = gnext, (fst(s), fst(w)) 
and fst(eout(s, w@)) = eout,(fst(s), fst(w)) for all s € St and Ñ € (A; x A2)*. Both 
claims are shown by induction on the length of w. With similar arguments we can 
obtain that snd(f(w)) = [G2] (snd(w)) for every © € (A; x Ag)*. It can be shown 
by induction that fst(F(G,w)(w)) = F(61, w1)(fst(w)) and snd(F(G, u)(w)) = 
F(61, 41)(snd(w)) for all © € (A; x Ag)*. In order to establish that f(w) = 
F(G, w)(w), it suffices to show that fst(f(w)) = fst(F(G, )(w)) and snd(f(w)) = 
snd(F(3, 44)(w)). Given the claims shown previously, these equalities are equiv- 
alent to [Gi](fst(w)) = F(1, yı) (fst(w)) and [G2] (snd(w)) = F(G2, 2)(snd(w)) 
respectively. These equalities follow from the assumptions that G, implements 
(81, 41) and Gz implements (82, p2). 


Proposition 29 (Serial Composition). Let A, B, C be monoids, (81, p11) : 

STrans(A, B) and (82, u2} : STrans(B, C) be transductions, and G; : G(A, B) and 

Gz : G(B, C) be transducers. 

(1) IMPLEMENTATION: If G; implements (61,41) and G2 implements ((, p2), 
then Serial(G,,G2) implements (1, 111) >> (b2, u2). 

(2) COHERENCE: If Gı and G2 are coherent, then so is Serial(G1, G2). 


Proof. Part (2) follows easily from Part (1) and Theorem [23] In order to prove 
Part (1) we have to first establish a number of preliminary facts. We define the 
function Mz : A* > A as follows: Mo(e) = 1, Mo((x)) = x for x € A, and 
Mo((x, y)- Z) = (zy) - Z for x,y € A and Z € A*. We write G to denote G1 > Go. 


fst(gnext(s,Z)) = gnext, (fst(s), Z) for all s € St and z € A* (7) 
snd(gnext(s, Z)) = gnext,(snd(s), eout;(fst(s),Z)) for all s € St and z € A* (8) 
[9] (=) = Ma([G2]({G1] ())) for all z € A* (9) 

F(8, u)(2) = Mo(F(2, #2)(F( G1, 1)(2))) for all z € A* (10) 


where (8, u) = (81, y1) >> (G2, 2). All four claims above are proved by induction 
on the sequence 7. Equations and are needed to prove Equation (9). Now, 
we will establish that G implements (£, u). Indeed, we have that 


[9] (@) = M2([92] (191) (z))) [Equation (9}] 
= Mo([G2] (F (41, #1) (Z))) [G1 implements (61, u1)] 
= Mo(F(Sa, u2)(F(61, #1) (Z))) [G2 implements (59, /12)| 
= F(6, p) (z) [Equation (10) 
for every  € A*. So, we conclude that G implements (8, u). o 


Let us give an example of how to construct complex computations from sim- 
pler ones using the dataflow combinators. Let A,B be sets and op: A > B 
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be a function. We want to describe a streaming computation with two input 
channels, both of type FBag(A), and one output channel of type FBag(B). 
The computation transforms both input channels in the same way, namely by 
applying the function op to each element. This gives two output substreams, 
both of type FBag(B), that are merged into the output stream. The function 
op: A > B lifts to a monoid homomorphism op : FBag(A) —> FBag(B), given 
by op(a) = {op(a) | a € x} for every multiset x. The streaming computation 
described previously can be visualized using the dataflow graph shown below. 


FBag(A) ; ee ( ) 

EET g FBag(B) 
ee 

FBag(A) Lift(op) ae 


Each edge of the graph represents a communication channel along which a stream 
flows, and it is annotated with the type of the stream. The dataflow graph 
above represents the transducer G = Serial(Par(Lift(op), Lift(op)), Merge), 
where Merge : G(FBag(A) x FBag(A), FBag(A)) is the transducer of Example[16] 
From Propositions [27] [29] and [28] we obtain that G implements the transduction 
(lift(op) || lift(op)) > merge, where merge is described in Example[11] 

We will now consider the feedback combinator, which introduces cycles in 
the dataflow graph. One consequence of cyclic graphs in the style of Kahn- 
MacQueen is that divergence can be introduced, that is, a finite amount 
of input can cause an operator to enter an infinite loop. For example, consider 
the transducer Merge : G(FBag(A) x FBag(A), FBag(A)) of Example [16] The 
figure below visualizes the dataflow graph, where the output channel of Merge 
is connected to one of its input channels, thus forming a feedback loop. 


FBag(A) Mere FBag(A) | FBag(A) 


Suppose that the singleton input {a} is fed to the input of the dataflow graph 
above, which corresponds to the first input channel of Merge. This will cause 
Merge to emit {a}, which will be sent again to the second input channel of Merge. 
Intuitively, this will cause the computation to enter an infinite loop (divergence) 
of consuming and emitting {a}. This behavior is undesirable in systems that 
process data streams, because divergence can make the system unresponsive. For 
this reason, we will consider here a form of feedback that eliminates this problem 
by ensuring that the computation of a feedback loop proceeds in a sequence of 
rounds. This avoid divergence, because the computation always makes progress 
by moving from one round to the next, as dictated by the input data. We describe 
this organization in rounds by requiring that the programmer specifies a splitter 
(recall Example (18). The splitter decomposes the input stream into batches, 
and one round of computation for the feedback loop corresponds to consuming 
one batch of data, generating the corresponding output batch, and sending the 
output batch along the feedback loop to be available for the next round of 
processing. This form of feedback allows flexibility in specifying what constitutes 
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a single batch (and thus a single round), and therefore generalizes the feedback 
combinator of Synchronous Languages such as Lustre BI. 


Proposition 30 (Feedback Composition). Let A and B be monoids, (8, 4) : 
STrans(A, B) be a transduction, G : G(A, B) be a transducer, and r = (r1,r2) 
be a splitter for A (see Example [13). 

(1) IMPLEM.: If G implements (8, u), then Loop(G,r) implements loop(8, p, r). 
(2) COHERENCE: If G is coherent, then so is Loop(G, r). 


Proof. We leave to the reader the proofs that Split (Example [18) implements 
split and that Flatten (Example[17) implements flatten. Given Proposition [29] 
it suffices to show that G’ = LoopB(G) implements (y,v) = loopB(G, u). Since 
G’ is of type G(FSeq(A), FSeq(B)) it suffices to define the transition and output 
functions on singleton sequences (as done in Table[1), because there is a unique 
way to extend them so that G’ is coherent. It remains to show that [G’](Z) = 
F(y,v)(Z) for every z € FSeq(A)*. The base case is easy, and for the step case it 
suffices to show that out’ (gnext’(init’, Z), y) = v(n (z), y) for every z € FSeq(A)* 
and y € FSeq(A). As we discussed before, gnext’ and out’ can be viewed as being 
defined on elements of A rather than sequences of FSeq(A), so we can equivalently 
prove that out’ (gnext’(init’, (ai,.--,@n)),@n41) = V((a1,---,@n), @n41) with each 
a; an element of A. Given that G implements (8, u), the key observation to finish 
the proof is gnext’(init’, (a1,...,@n)) = (gnext(init, ((a1, bo),---, (an, bn—1))), bn), 
where 7((a1,---,;@n)) = (bo, b1,---, Bn). 


Example 31. For an example of using the feedback combinator, consider the 
transduction (3, u) which adds two input streams of numbers pointwise. That 
is, 8 : FSeq(N) x FSeq(N) — FSeq(N) is defined by B(a1122...%m,Y1Y2---Yn) = 
O(a, + yi)(@2 + Y2)... (£k + Yk) where k = min(m,n). Additionally, consider 
the trivial splitter r = (r1,r2) for sequences where each batch is a singleton: 
rı(£ı... En) = (@1,.-.,%n) and r2(z1 ... £n) =£. We use this splitter to enforce 
that each batch is a single element and that each round of the computation 
involves consuming one element. Finally, the transduction loop(8, u,r) = (y, v) 
describes the running sum, that is, y(£1 ... £n) = Ov (a1 +z2)... (£1 +- +£n). 


The dataflow combinators of this section could form the basis of query lan- 
guage design. The StreaamQRE language and related formalisms p2] 
are based on a set of combinators for efficiently processing linearly-ordered 
streams (e.g., time series BM). Extending a language like StreamQRE to the 
typed setting of stream transductions is an interesting research direction. 


6 Algebraic Reasoning for Optimizing Transformations 


Our typed denotational framework can be used to validate optimizing transfor- 
mations using algebraic reasoning. This amounts to establishing that the original 
transducer is equivalent to the optimized one. A fundamental approach for show- 
ing equivalence of composite transducers is to establish algebraic laws between 
basic building blocks, and then use algebraic rewriting. 
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As a concrete example, consider the per-key streaming aggregation of Exam- 
ple[10| which is described by the transduction reduce(K, op) : STrans(FBag(K x 
V),FMap(K,V)), where K is the set of keys, V is the set of values, and op : 
V xV — V is an associative and commutative aggregation operation. Let 
h: K > {1,...,n} be a hash function for the keys, and define K? = hat (i) = 
{k € K | h(k) = i} for every i. Consider two variants of the merging operation of 
eee a (1) kmerge(h) merges n input streams of types FBag(K? x V),..., 
FBag(K? x V) respectively into an output stream of type FBag(K x V), and (2) 
mmerge(h) merges n input streams of types FMap(K?,V), ..., FMap( K}, V) 
respectively into an output stream of type FMap(K, V). We also consider the 
transduction ksplit(h) that partitions an input stream of type FBag(K x V) 
into n output substreams of types FBag(K? x V), ..., FBag(K x V) respec- 
tively. Using elementary set-theoretic arguments, the following equalities can be 
established: ksplit(h) > kmerge(h) = id and 


kmerge(h) > rd(K, op) = (rd(K?, op) || --- || ra(K}, op)) > mmerge(h), 


where rd abbreviates reduce. Next, we consider the corresponding transducers 
KSplit(h), KMerge(h), Id, Reduce(K’, op) (abbreviation Rd) and MMerge(h) and 
establish that they implement the respective transductions. This can be shown 
with induction proofs as shown earlier in Example [25] and Example Using 
these facts and the propositions of Sect. |5| the equalities between transductions 
shown earlier give the following equations (equivalences) between transducers: 
KSplit(h) >> KMerge(h) = Id and 


KMerge(h) >> Rd(K, op) = (Rd(K, op) || --- || Ra(K”, op)) > MMerge(h). 


Using these equations, we can establish the following optimizing transformation 
for data parallelization, which is useful when processing high-rate data streams. 


Reduce(K, op) = Id > Reduce(K, op) 
= KSplit(h) > kMerge(h) > Reduce(K, op) 
= KSplit(h) > (Ra(K}, op) || «+ || RA(K}, op)) > MMerge(h). 


The above equation illustrates our proposed style of reasoning for establishing 
the soundness of optimizing streaming transformations: (1) prove equalities be- 
tween transductions using elementary set-theoretic arguments, (2) prove that 
the transducers (programs) implement the transductions (denotations) using 
induction, (3) translate the equalities between transductions into equivalences 
between transducers using the results of Sect. |5| and finally (4) use algebraic 
reasoning to establish more complex equivalences. 

The example of this section is simple but illustrates two key points: (1) our 
data types for streams (monoids) capture important invariants about the streams 
that enable transformations, and (2) useful program transformations can be 
established with denotational arguments that require an appropriate notion of 
transduction. This approach opens up the possibility of formally verifying the 
wealth of optimizing transformations that are used in stream processing systems. 
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The papers describe several of them, but use informal arguments that 
rely on the operational intuition about streaming computations. Our approach 
here, on the other hand, relies on rigorous denotational arguments. 

The equational axiomatizations of arrows and traced monoidal categories 
are relevant to our setting, but would require adaptation. An interesting 
question is whether a complete axiomatization can be provided for the basic 
dataflow combinators of Sect. |5| similarly to how Kleene Algebra (KA) 
and its extensions (as well as other program logics |65/66|/78]80}/82! ) 
capture properties of imperative programs at the propositional level. We also 
leave for future work the development of the coalgebraic approach for 
reasoning about the equivalence of stream transducers. We have already defined 
a notion of bisimulation in Sect. |4| which could give an alternative approach for 
proving equivalence using coinduction on the transducers. 


7 Related Work 


Sect. |1| contains several pointers to related literature for stream processing. In 
this section, we will focus on prior work that specifically addresses aspects of 
formal semantics for streaming computation. 

The seminal work of Gilles Kahn is exemplary in its rigorous treatment 
of denotational semantics for a language of deterministic dataflow graphs 
of independent processes, which access their input channels using blocking read 
statements and the output channels using nonblocking write statements. The lan- 
guage Lustre is a synchronous restriction of Kahn’s model, which introduces 
the semantic idea of a clock for specifying the rate of a stream. Other notable 
synchronous formalisms are the language eo and Esterel (22/28), and 
the synchronous dataflow graphs of and |24|. These formalisms are all de- 
terministic, in the sense the the output is determined purely by the input data. 
Nondeterminism creates unavoidable semantic complications [80]. 

The CQL language is a streaming extension of a relational database 
language with additional constructs for time-based windowing. The denotational 
semantics of CQL can be reconstructed and greatly simplified within our 
framework using the notion of stream described in Example[7|(finite time-varying 
multisets). There are several works that deal with the semantics of specific lan- 
guage constructs (e.g., windows), notions of time, punctuations and disordered 
streams, but do not give a mathematical description of the overall streaming 


computation : 


The literature on Functional Reactive Programming (FRP) 
is closely related to the deterministic dataflow formalisms men- 
tioned earlier. The main abstractions in FRP are signals and event sequences, 
which are linearly ordered data. Processing unordered data (e.g., multisets and 
maps) and extracting data parallelism (e.g., the per-key aggregation of Sect. (6) 
require a data model that goes beyond linear orders. In particular, the axioms 
of arrows (often used in FRP) cannot prove the soundness of the optimizing 
transformation of Sect. a which requires reasoning about multisets. 
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The idea of using types to classify streams has been recently explored in 
(see also (13), but only for a restricted class of types that correspond to partial 
orders. No general abstract model of computation is presented in [85], and many 
of the examples in this paper cannot be adequately accomodated. 

The mathematical framework of coalgebras has been used to describe 
streams [98]. One advantage of this approach is that proofs of equivalence can 
be given using the proof principle of coinduction (96), which in many cases offers 
a useful alternative to proofs by induction. This line of work mostly focuses on 
infinite sequences of elements, whereas here we focus on the transformation of 
streams of data that can be of various different forms (not just sequences). 

The idea to model the input/output of automata using monoids has appeared 
in the algebraic theory of automata and transducers. Monoids (non-free, e.g. 
A* x B*) have been used to generalize automata from recognizers of languages 
to recognizers of relations (45), which are sometimes called rational transduc- 
ers [100]. Our focus here is on (deterministic) functions, as models that recog- 
nize relations can give rise to the Brock-Ackerman anomaly [80]. The automata 
models (with inputs from a free monoid A*) most closely related to our stream 
transducers are deterministic: Mealy machines [87], Moore machines (90), se- 
quential transducers [48}[95], and sub-sequential transducers (102). The concept 
of coherence that we introduce here (Definition 20) does not arise in these mod- 
els, because they do not operate on input batches. An algebraic generalization 
of a deterministic acceptor is provided by a right monoid action 6: St x A— St 
(see page 231 of [100]), which satisfies the following properties for all s € St and 
x,y € A: (1) 6(s,1) = s, and (2) ô&(ô(s, x), y) = ô(s, xy). These properties look 
similar to (N1) and (N2) of Definition [20] They are, however, too restrictive for 
our stream transducers, as they would falsify Theorem 


8 Conclusion 


We have presented a typed semantic framework for stream processing, based 
on the idea of abstracting data streams as elements of algebraic structures 
called monoids. Data streams are thus classified using monoids as types. Stream 
transformations are modeled as monotone functions, which are organized by in- 
put/output type. We have adapted the classical model of string transducers to 
our setting, and we have developed a general theory of streaming computation 
with a formal denotational semantics. The entire technical development in this 
paper is constructive, and therefore lends itself well to formalization in a proof 
assistant such as Coq [2385106]. Our framework can be used for the formaliza- 
tion of streaming models, and the validation of subtle optimizations of stream- 
ing programs (e.g., Sect. (6), such as the ones described in Bpo. We have 
restricted our attention in this paper to deterministic streaming computation, 
in the sense that the behaviors that we model have predictable and reproducible 
results. Nondeterminism causes fundamental semantic difficulties 30], and it is 
undesirable in applications where repeatability is important. 
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Abstract. Separation logic is a useful tool for proving the correctness of 
programs that manipulate memory, especially when the model of memory 
includes higher-order state: Step-indexing, predicates in the heap, and 
higher-order ghost state have been used to reason about function point- 
ers, data structure invariants, and complex concurrency patterns. On 
the other hand, the behavior of system features (e.g., operating systems) 
and the external world (e.g., communication between components) is 
usually specified using first-order formalisms. In principle, the soundness 
theorem of a separation logic is its interface with first-order theorems, 
but the soundness theorem may implicitly make assumptions about how 
other components are specified, limiting its use. In this paper, we show 
how to extend the higher-order separation logic of the Verified Software 
Toolchain to interface with a first-order verified operating system, in 
this case CertikOS, that mediates its interaction with the outside world. 
The resulting system allows us to prove the correctness of C programs 
in separation logic based on the semantics of system calls implemented 
in CertiKOS. It also demonstrates that the combination of interaction 
trees + CompCert memories serves well as a lingua franca to interface 
and compose two quite different styles of program verification. 


Keywords: formal verification - verifying communication - modular ver- 
ification - interaction trees - VST - CertiKOS 


1 Introduction 


Separation logic allows us to verify programs by stating pre- and postconditions 
that describe the memory usage of a program. Modern variants include reasoning 
principles for shared-memory concurrency, invariants of locks and shared data 
structures, function pointers, rely-guarantee-style reasoning, and various other 
interesting features of programming languages. To support these features, the 
“memory” that is the subject of their assertions is not just a map from addresses 
to values, but something more complex: it may contain “predicates in the heap” 
to allow reasoning about invariants attached to dynamically allocated objects 
such as semaphores, it may be step-indexed to allow higher-order assertions, and 
it may contain various forms of ghost state describing resources that exist only 
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for the purposes of verification. The soundness proof of the logic then relates 
these decorated heaps to the simple address-map view of memory used in the 
semantics of the target language. 

This works well as long as every piece of the system is verified with re- 
spect to decorated heaps, but what if we have multiple verification tools, some 
of which provide correctness results in terms of undecorated memory (or, still 
worse, memory with a different set of decorations)? To take advantage of the 
correctness theorem of a function verified with one of these tools, we will need 
to translate our decorated memory into an undecorated one, demonstrate that 
it meets the function’s undecorated precondition, and then take the memory 
output by the function and use it to reconstruct a decorated memory. In this 
paper, we demonstrate a technique to do exactly that, allowing higher-order 
separation logics (in this instance, the Verified Software Toolchain) to take ad- 
vantage of correctness proofs generated by other tools (in this case, the CertikKOS 
verified operating system). This allows us to remove the separation-logic-level 
specifications of system calls from our trusted computing base, instead relying 
on the operating system’s proofs of its own calls. In particular, we are interested 
in functions that do more than just manipulate memory (which is separation 
logic’s specialty)—they communicate with the outside world, which may not 
know anything about program memory or higher-order state. 


int main(void) { 

unsigned int n, d; char c; 

n=0; 

c=getchar(); 

while (n<1000) { 
d = ((unsigned)c)-(unsigned)’0’; 
if (d>=10) break; 
n+=d; 
print_int(n); 
putchar(’\n’); 
c=getchar(); 

} 


return 0; 


Fig. 1: A simple communicating program 


Consider the program in Figure 1. It repeatedly reads a digit from the 
console, adds it to the sum of the digits seen so far, and prints the current 
sum to the console. Although this is a very simple program, it is not a nat- 
ural fit for separation-logic-based verification tools, which model the behavior 
of C programs in terms of computation and memory rather than I/O. Sev- 
eral approaches have been suggested for reasoning about I/O in separation 
logic, for instance by Penninckx et al. [18] and Koh et al. [13]. Using the lat- 
ter approach, we might specify the behavior of getchar with the Hoare triple 
{ITree(r 4+ read;; k r)} x = getchar() {ITree(k x)}, relating the function call to 
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an external read event: the program before the call to getchar must have per- 
mission to perform a sequence of operations beginning with a read, and after the 
call it has permission to perform the remaining operations (with values that may 
depend upon the received value). By adding these specifications as axioms to 
VST’s separation logic, we can use standard separation logic techniques to prove 
the correctness of programs such as the one above. But when we compile and 
run this C program, putchar and getchar are not axiomatized functions; they 
are system calls provided by the operating system, which may have an effect 
on kernel memory, user memory, and of course the console itself. If we prove 
a specification of this C program using the separation logic rules for putchar 
and getchar, what does that tell us about the behavior of the program when it 
runs? For programs without external calls, we can answer this question with the 
soundness proof of the logic. To extend this soundness proof to programs with 
external calls, we must relate the pre- and postconditions of the external calls 
to both the semantics of C and their implementations in the operating system. 

In this paper, we describe a modular approach to proving soundness of a ver- 
ification system for communicating programs, including the following elements: 


— An extension of VST with support for generic ghost state. 

— A generic mechanism for reasoning about external communication in a higher- 
order separation logic, built on top of ghost state. 

— A technique for relating pre- and postconditions for external functions in 
higher-order separation logic to first-order specifications of the same func- 
tions in the verified operating system CertiKOS, with a general approach to 
“de-step-indexing” a certain class of step-indexed specifications. 

— A new notion of correctness of the implementation of external communi- 
cation, by relating user-level traces of external behavior to I/O operations 
inside the operating system. 


The result is the first soundness proof of a separation logic that can be extended 
with first-order specifications of system calls. All proofs are formalized in the 
Coq proof assistant. 

To understand the scope of our results, it is important to clarify exactly 
how much of CertiKOS we have brought into our proofs of correctness for C 
programs, and how much of a gap remains. The semantics on which we prove 
the soundness of our separation logic is the standard CompCert semantics of 
C, extended with the specifications of system calls provided by CertiKOS. Our 
model does not include the process by which CertiKOS switches from user mode 
to kernel mode when executing a system call, but rather assumes that CertiKOS 
implements this process so that the user cannot distinguish it from a normal 
function call. To prove this assertion rather than assuming it, we would need to 
transfer our soundness proof to the whole-system assembly-language semantics 
used by CertiKOS, and interface with not just CertiKOS’s system call specifica- 
tions but also its top-level correctness theorem. We discuss this last gap further 
in Section 7, but in summary, we prove that our client-side programs and OS-side 
system calls are correct, while assuming that CertiKOS correctly implements its 
transition between user mode and kernel mode. 
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The rest of the paper proceeds as follows. In Section 2, we describe generic 
ghost state in separation logic. In Section 3, we show how to encode the state 
of the outside world as ghost state that can only be changed through calls to 
external functions, allowing us to describe external communication in separation 
logic specifications. In Section 4, we use this approach to specify console I/O op- 
erations, and demonstrate the verification of a simple communicating program. 
In Sections 5 and 6, we describe the process of verifying the implementation of 
an external call, by first connecting its VST specification to a first-order speci- 
fication on memory and then relating that “dry” specification to the functional 
specification of the same call in CertiKOS. This allows us to state our central 
theorem, which guarantees that programs verified in VST run correctly given the 
CertiKOS system call specifications. In Section 7, we address the relationship 
between user-level events and the actual communication performed by the OS. 
In Sections 8 and 9, we review related work and summarize our results. 


2 Background: Ghost State in Separation Logic 


2.1 Ghost Algebras 


The fundamental insight behind ghost state is that if a mathematical object 
has the same basic properties as a separation logic heap, it can be injected 
into separation logic as a resource, even if it is not actually present in program 
memory. This insight was discovered independently by many people [4,3,19], and 
the “basic properties” required have been characterized in many ways: partial 
commutative monoids (PCMs), resource algebras, separation algebras, etc. They 
all include the idea that the ghost state must support an operator, often written 
as -, for combining it in the same way heaps are combined by disjoint union, 
and they require that operator to have some of the properties of heap union 
(associativity, commutativity) but not all (for instance, it may be possible to 
combine two identical pieces of ghost state). Crucially, the operator - may be 
partial, so that the very existence of one piece of state means that another piece 
cannot possibly exist in the same program (just as ownership of one piece of the 
heap means that no other thread can hold the same piece). We follow Iris [11] 
in also including a validity predicate valid that marks out the elements of an 
algebra that represent well-formed ghost state. 

Ghost state appears in the logic in a new kind of assertion, which we write 
as own, asserting that the current thread owns a certain ghost resource. In the 
assertion own g a pp, g is an identifier (analogous to a location in the heap), a is 
an element of the underlying algebra, and pp is a predicate, allowing for a limited 
form of higher-order ghost state—for instance, we can store separation logic 
assertions in ghost state to implement global invariants. The key property of the 
own assertion is that separating conjunction on it corresponds to the - operator 
of the underlying algebra (see rule own_op in Figure 2). By defining different 
algebras with different operators, we can define different sharing protocols for 
the ghost state. For instance, if we only want to count the number of times 
some shared resource is used, the state may be a number and the operator 
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al-a2=a3 
own g a3 pp & own g al pp * own g a2 pp 


own_op 
fp_update a b 
own g a pp > own g b pp 


PSP {P3hC1Q} sa 
{P} C {Q} 


own-update 


consequence 


Fig. 2: Key separation logic rules for ghost state 


may be addition; if we want to describe the pattern of sharing more precisely, 
as with ghost variables, the state may be a pair of the variable’s value and a 
fraction of ownership, with a guarantee that two fractions are only compatible 
if they agree on the value. More complex sharing patterns correspond to more 
complicated join operations; for instance, Jung et al. [11] showed that any acyclic 
state machine can be encoded as ghost state, with the join operation computing 
the closest common successor of two states. The ghost state is not explicitly 
referenced by program instructions, but it can be modified at any time via a 
frame-preserving update: ghost state a can be replaced with b as long as any 
third party’s ghost state c that is consistent with a is also consistent with b, 
formally expressed as fp_update a b £ Vc,a-c = b- c, where we write a- b to 
mean Jd. a-b = d, i.e., a and b are compatible pieces of ghost state. This frame- 
preserving update is embedded into the logic using a view-shift operator >, as 
shown in rule own-update of Figure 2. 


x= 0; 
acquire (1) ;|/acquire(1) ; 
xtt+; Xt 


release(1) ;|/release(1) ; 


Fig. 3: The increment example 


Figure 3 shows the canonical example of a program where ghost state in- 
creases the verification power of separation logic. Using concurrent separation 
logic as originally presented by O’Hearn [17], we can prove that the value of x 
at the end of the program is at least 0, but we cannot prove that it is exactly 2. 
This limitation comes from the fact that we can associate an invariant with the 
lock 1, but that invariant cannot express progress properties such as a change 
in the value of x. We can get around this limitation by adding ghost state that 
captures the contribution of each thread to x, and then use the invariant to en- 
sure that the value of x is the sum of all contributions. (This approach is due to 
Ley-Wild and Nanevski [16].) We begin with ghost state that models the central 
operation of the program: 
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Definition 1. The sum ghost algebra is the algebra (N,+,An.True) of natural 
numbers with addition, in which every number is a valid element. 


Intuitively, the lock invariant should remember every addition to x, while each 
individual thread only knows its own contribution. This is actually an instance of 
a very general pattern: the reference pattern, in which one party holds a complete 
and correct “reference” copy of some ghost state, and one or more other parties 
hold possibly incomplete “partial” copies. Because the reference copy must al- 
ways be completely up to date, the partial copies cannot be modified without 
access to the reference copy. When all the partial copies are gathered together, 
they are guaranteed to accurately represent the state of the data structure. The 
reference ghost algebra is built as follows: 


Definition 2. Given a ghost algebra G, we define the positive ghost algebra on 
G, written pos(G), as an algebra whose carrier set is (II x G)U {1}, where IT 
is a set of shares.* An element of pos(G) is valid if it has a nonempty share, 
and the operator - is defined such that (77, a1) - (72, a2) = (71 + 72,41 + a2) and 
v-Ll=2 for alla. 


The positive ghost algebra contains pairs of a nonempty share and an element 
of G, with join defined pointwise, representing partial ownership of an element 
of G. Total ownership of the element can be recovered by combining all of the 
pieces, obtaining a full share, and combining all of the G elements accordingly. 


Definition 3. Given a ghost algebra G, let the reference ghost algebra on G, 
written ref(G), be the algebra (pos(G) x (GU L),-,{(p,r) |r = LVp Cr), 
where (p1,1) - (p2, L) = (pi - 2,7), and pE r Ê 3q. p- q = (T,r). 


An element of the reference ghost algebra is a pair of a positive share of G 
(partial element) and an optional reference element of G, where the reference 
element is unique and indivisible, and the partial element must be completable 
to the reference element if one exists. This ensures that when all the shares are 
gathered, i.e., when the partial element is (T,a), then it exactly matches the 
reference element, but no changes can be made to the partial element without 
the reference element present. To more clearly relate elements of this algebra 
to their intended meanings, we write ref r for the reference element (L, r) and 
part s v for the partial element ((s, v), L). 

Now we can formalize our intuition about what each party knows about the 
sum. We let the lock invariant for 1 be du. x > v x own g (ref v), and start each 
thread with a partial element part 4 0. When each thread acquires its lock and 
increments x, it also uses the own_update rule to increment its partial ghost state. 
At the end of the program, we can combine the two partial elements to obtain 
part T 2, which in combination with the lock invariant is sufficient to guarantee 
that the value of x is 2. This pattern can be used for a wide range of applications 


4 We use tree shares [1, Chapter 41] in the Coq proofs, but for simplicity of presentation 
in this paper we will use fractional shares: | is the empty share, 4 is a half share, 
and T is the full share. 
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by replacing the sum algebra with one appropriate to the application or data 
structure in question. We will also make use of it later to model the state of the 
external world as a separation logic resource. 


2.2 Semantics of Ghost State 


To support the use of ghost state in a separation logic, we need to make two main 
changes in the construction of the logic. First, we need to extend the underlying 
model of the logic with ghost state: rather than being predicates on the heap, 
our assertions are now predicates on the combination of heap and ghost state. 
Once ghost state exists in the model, we can give semantics to the own assertion. 

Second, we need to change our definition of Hoare triples to allow for the 
possibility of frame-preserving updates to ghost state at any point in a program’s 
execution. In a ghost-free separation logic, we might define Hoare triples with 
respect to an operational semantics for the language as follows: 


HP} c {Q}] £ Yh, P(h) = (c,h) >* (done, h’) > Q(h’) 


where (c,h) > (c’,h’) means that the program c executed with starting heap h 
may take a step to a new program c’ with heap h’. For a step-indexed logic, it 
is more convenient to write this definition inductively: 


Definition 4 (Safety). A configuration (c,h) is safe for n steps with postcon- 
dition Q if: 


— n is 0, or 
— c has terminated and Q(h) holds to approximation (step-index) n, or 
— (c,h) > (c, h') and (œ, h’) is safe for n— 1 steps with Q. 


We can then define {P} c {Q} (at step-index n) to mean that Vh. P(h) = (c,h) 
is safe for n steps with Q. 

Once we have added ghost state, our heap h is now a pair (h, g) of physical 
and ghost state, and between any two steps the ghost state may change. This 
leads us to a ghost-augmented version of safety. 


Definition 5 (Safety with Ghost State). A configuration (c,h,g) is safe for 
n steps with postcondition Q if: 


— nis 0, or 

— c has terminated and Q(h,g) holds to approximation n, or 

— (c,h) > (¢,h’) and Vgtrame- 9 ` Gframe => 3g’. (9! ` Gframe A (C, h’, g’) is safe 
for n—1 steps with Q). 


The program must be able to continue executing under any gframe consistent 
with its current ghost state, but its choice of new ghost state g’ may depend on 
the frame. This quantifier alternation captures the essence of ghost state: the 
ghost state held by the program constrains any other ghost state held by the 
notional “rest of the system”, and may be changed arbitrarily in any way that 
does not invalidate that other ghost state. 
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3 External State as Ghost State 


An I/O-performing program modifies the state of the outside world. We would 
like to treat this external state as a kind of ghost state, since it is not in the 
program’s memory and yet can be described by separation logic assertions. At 
the same time, we would emphatically not like to allow users to make arbitrary 
frame-preserving updates to external state: the external environment should have 
complete control of the external state, and the program should never be able to 
change it except by calling external functions. Furthermore, VST’s semantic 
model (used to prove soundness) already includes an external state element, a 
black box of arbitrary type that is carried around by the program and passed to 
the environment at each external call, allowing the effects of external calls to be 
stateful without explicitly representing their state in program memory. While 
this external state is present in the operational semantics of VST, prior to the 
changes we describe it could not be referred to by separation logic assertions and 
was never instantiated with anything other than the singleton type unit. In this 
section, we describe how we combine ghost state with the built-in external state 
to make the external state visible in the separation logic. 

Intuitively, external state is just another kind of shared resource, and we 
should be able to model it with a form of ghost state. However, one of the key 
features of ghost state is that programs can make arbitrary frame-preserving 
updates to it, while programs should never be able to modify external state. We 
can accomplish this using the reference ghost algebra of Section 2: the reference 
element ref a will be held by the external environment, while the program holds 
a partial element part T a. This ensures that the program cannot make any 
frame-preserving updates without the reference element, which is only available 
when the program passes control to the external environment via an external 
call. It then remains to choose the underlying algebra G of the external state. 
Different applications may call for external state with different carrier sets and 
operations, but in the simplest case, the VST user will not want to split or 
combine the local copy of the external state®. In this case, they can pick a type 
Z and make G the exclusive ghost algebra for Z, which holds only an empty 
unit element and an indivisible ownership element, preventing the local copy 
from being divided. Then the user program holds an element part T a that 
cannot be divided or modified, but only passed to the external environment, 
where a: Z is the current value of the external state. We encapsulate the ghost 
state construction in an assertion has_ext a = own 0 (part T a), where 0 is the 
identifier reserved for the external ghost state. Now, when verifying a program 
with external state, the user simply provides the starting state a, and receives 
in the precondition of the main function the assertion has_ext a, with no need to 
use or understand the ghost state mechanism. 


5 Appel et al. [1] call this the external oracle, but we refer to it as simply “external 
state” to avoid confusion with the environment oracles of CertiKOS. 

6 One example of a use case that benefits from nontrivial external state structure is a 
multithreaded web server in which different threads serve different clients simulta- 
neously; in this case, each thread might have its own piece of the external state. 
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On the back end, we must still modify VST’s semantics to connect the ghost 
state a to the actual external state, and to prevent the “ghost steps” of the 
semantics from changing the external state. Recall from Section 2 that in order 
for a non-terminated configuration (c,h,g) to be safe for a nonzero number 
of steps, it must be the case that (c,h) > (c,h’) and Vgtrame- 9 ` Yframe => 
dg’. 9' ` Gframe A (c’,h’, g’) is safe. To connect the external ghost state to a real 
external state z, we simply extend this definition to require that gframe include 
an element (L,z) at identifier 0. This enforces the requirement that the value 
of the external ghost state always be the same as the value of the external 
state, and ensures that frame-preserving updates cannot change the value of the 
external state. Re-proving the separation logic rules of Verifiable C with this new 
definition of Hoare triple required only minor changes, since internal program 
steps never change the external ghost state. 

When the semantics reaches an external call, the call is allowed to make 
arbitrary changes to the state consistent with its pre- and postcondition, in- 
cluding changing the value of the external ghost state (as well as the actual 
external state). We can use has_ext assertions in the pre- and postcondition of 
an external function to describe how that function affects the external state. For 
instance, we might give a console write function the “consuming-style” specifica- 
tion {has_ext(write(v);; k)} write(v) {has_ext(k)}, stating that if before calling 
write(v) the program has permission to write the value v and then do the opera- 
tions in k, then after the call it is left with permission to do k. (We could reverse 
the pre- and postcondition for a “trace-style” specification, in which the external 
state records the history of operations performed by the program instead of the 
future operations allowed.) In this paper, we use interaction trees [13] as a means 
of describing a collection of allowed traces of external events. Interaction trees 
can be thought of as “abstract traces with binding”; for instance, we can write 
x + read;;write (x + 1);;k x to mean “read a value, call it x, write the value 
x +1, and then continue to do the actions in k using the same value of x.” 

In the end, we have a new assertion has_ext on external state that works in 
exactly the way we expect: it can hold external state of any type, it cannot be 
modified by user code, it can be freely modified by external calls, it always has 
exactly the same value as the external state already present in VST’s semantics, 
and it exposes no ghost-state functionality to the user. If the user wants more 
fine-grained control over external state (for instance, to split it into pieces so 
multiple threads can make concurrent calls to external functions), they can define 
their own ghost algebra for the state and pass around part elements explicitly, 
but for the common case, has_ext provides seamless separation-logic reasoning 
about C programs that interact with an external environment. 


4 Verifying C Programs with I/O in VST 


Once we have separation logic specifications for external function calls, verifying 
a communicating program is no different from verifying any other program. We 
demonstrate this with the example program excerpted in Figure 1, shown in 
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{ITree(write_list(decimal_rep’(i));; k)} {ITree(c + read; ; main_loop(0,c))} 
void print_intr(unsigned int i) { int main(void) { 
unsigned int q,r; unsigned int n, d; char c; 
if (i!=0) { 
q=i/10u; n=0; 
r=i%10u; c=getchar(); 
print_intr(q); while (n<1000) { 
putchar(r+’0’); d = ((unsigned)c)- 
} (unsigned)’0’; 
} if (d>=10) break; 
{ITree(k)} samy 


print_int(n); 
putchar(’\n’); 


{ITree(write_list(decimal_rep(z));; k)} c= gotohan): 


void print_int(unsigned int i) { } 
if (i==0) return 0; 
putchar(’0’); } 
else print_intr(i); 
ITree(done 
{'Tree(done)} 
{ITree(k)} 


Fig. 4: A simple communicating program, with specifications for each function 


full in Figure 4. The print_intr function uses external calls to putchar to print 
the decimal representation of its argument, as long as that argument is nonzero; 
print_int handles the zero case as well. The main function repeatedly reads in 
digits using getchar and then prints the running total of the digits read so far. 
The ITree predicate is simply a wrapper around the has_ext predicate of the 
previous section (i.e., an assertion on the external ghost state), specialized to 
interaction trees on I/O operations. We can then write simple specifications for 
getchar and putchar, using interaction trees to represent external state: 


{ITree(r + read;; k r)} z = getchar() {ITree(k x)} 
{ITree(write(x);; k)} putchar(x) {ITree(k)} 


Next, we annotate each function with separation logic pre- and postcon- 
ditions; the program does not manipulate memory, so the specifications only 
describe the I/O behavior of each function. The effect of print_intr is to make a 
series of calls to putchar, printing the digits of the argument i as computed by 
the meta-level function decimal_rep’ (where write_list([i9; 71; ...;%n]) is an abbre- 
viation for the series of outputs write(%o);; write(?1);; ...;; write(¢,)). When the 
value of i is 0, print_intr assumes that the number has been completely printed, 
so print_int adds a special case for 0 as the initial input. The specification for 
the main loop is a recursive sequence of read and write operations, taking the 
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running total (which starts at 0) and the most recent input as arguments: 


main_loop(n, d) £ if n < 1000 


then write_list(decimal-rep(n + d));;c < read;; main_loop(n + d,c) else done 


Using the specifications for putchar and getchar as axioms, we can easily prove 
the specifications of print_intr, print_int, and main. (The following sections show 
how we substantiate these axioms.) 


{ITree(@ + read_list(n);; k £) * buf + _} 
x = getchars(buf,n) 
{dus. length(vs) = n A x = n ITree(k us) * buf + vs} 


{length(vs) = n A ITree(write_list(vs);; k) * buf ++ vs} 
putchars(buf,n) 
{ITree(k) * buf =œ vs} 


Fig. 5: Separation logic specifications for I/O calls with memory 


More complicated programs may manipulate memory as well as communicat- 
ing, and we can easily combine the two. For instance, if we want to read or write 
several characters in a single call, the standard C idiom is to pass a buffer in 
memory as an argument. Figure 5 shows the specifications for functions putchars 
and getchars in this style, where each function takes as arguments a buffer to 
hold the input/output and a number indicating the size of the buffer’. The pre- 
and postconditions of these functions now involve both the external state and 
a standard points-to assertion for the buffer. (Note that @ + read_list(n) is an 
abbreviation for the series of inputs 09 + read;; lı + read; ; ...;; €n—1 < read.) 

Figures 6 and 7 show a variant of the previous program that uses these exter- 
nal functions with memory. The print_intr function now populates a buffer with 
the characters to be written and returns the length of the decimal representation 
of its argument (retval in the postcondition refers to the return value of the func- 
tion), while print_int makes a single call to putchars with the populated buffer. 
The main function now reads four characters at a time and then processes them 
one by one, ultimately producing the same output as the previous program. The 
specifications for putchars and getchars describe changes to both external state 
and memory, as shown in Figure 5. Proving the specifications for the functions in 
this program is not any more difficult than in the memoryless case: we define an 
interaction tree main_loop capturing the slightly different pattern of interaction 
in this program, and then apply the appropriate separation logic rule to each 
command. The external calls affect both memory and the |Tree predicate, while 
all other commands affect only memory and local variables, as usual. 


T While these are not standard POSIX I/O functions, they are close to the behavior 
of POSIX read/write, socket operations, and other common forms of I/O. 
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{length(decimal_rep’(i)) < length(contents) ^  {ITree(write_list(decimal_rep(i));; k)} 


buf ++ contents} void print_int(unsigned int i) { 
unsigned char *buf = malloc(5); 


int print_intr(unsigned int i, 
if (!buf) exit(1); 


unsigned char xbuf) { 


unsigned int q; int k; 
unsigned char r; if (i==0){ oe 
int k = 1; buf[0] = ’0’; 
if (i!=0) { buf[1] = ’\n’; 
q=i/10u; k = 2; 
r=i%10u; } 
k = print_intr(q, buf); else{ 
buf[k] = r+’0’; k = print_intr(i, buf); 
buf{k] = ’\n’; 
return k + 1; k++; 
putchars(buf, k); 
{buf > contents |0...(retval — 1) := free(buf); 
decimal_rep’ (i)]} } 


{ITree(k) } 


Fig.6: A communicating program with memory (part 1) 


5 Soundness of External-State Reasoning 


The soundness proof of VST [1] describes the guarantees that the Hoare-logic 
proof of correctness for a C program provides about the actual execution of that 
program. A C program P is represented as a list P,,...,P,, of function definitions 
in CompCert Clight, a Coq representation of the abstract syntax of C. The 
program is annotated with a collection of function specifications (i.e., separation 
logic pre- and postconditions) = T4, ..., I n, one for each function. We then 
prove that each P; satisfies its specification I;, which we write as [+ P; : I; 
(note that each function may call on the specification of any function, including 
itself). The soundness theorem of VST without external function calls is then: 


Theorem 1 (VST Soundness). Let P be a program with specification T. 
Suppose for every function P; there is a proof I + P; : T; that P; satisfies 
its specification. Then the main function of P can run according to the Comp- 
Cert Clight semantics for any number of steps without getting stuck, and if it 
terminates then it does so in a state that satisfies its postcondition. 


Proof. First, make a nonstandard, ownership-annotated, resource-annotated, step- 
indexed small-step semantics for Clight. Define Verifiable C’s Hoare triple as a 
shallowly embedded statement about safe executions in this “juicy” semantics. 
Then show that executions in the juicy semantics erase to corresponding safe 
executions in Clight’s standard “dry” small-step semantics. 
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{ITree(cs + read_list(4);; main_loop’ (0, cs))} 


int main(void) { 
unsigned int n, d; unsigned char c; 
unsigned char «buf; 
int i, j; 


n=0; 

buf = malloc(4); 

if (!buf) exit(1); 

i = getchars(buf, 4); 

while (n<1000) { 

for(j = 0; j < i; j++){ 

c = buff]; 
d = ((unsigned)c)-(unsigned)’0’; 
if (d>=10) { free(buf); return 0; } 
n+=d; 
print_int(n); 


} 


} 
free(buf); 


return 0; 


i 
{ITree(done) } 


getchars(buf, 4); 


Fig. 7: A communicating program with memory (part 2) 


Corollary 1. Since null pointer dereferences, integer overflows, etc. are all 
stuck in CompCert’s small-step semantics, this means that a verified program 
will be free of all of these kinds of errors. 


This soundness theorem expresses the relationship between the juicy seman- 
tics described by VST’s separation logic and the dry semantics under which 
C programs actually execute®. The proof of correctness of a program gives us 
enough information to construct a corresponding dry execution for each juicy 
execution®. However, we may not have access to the code of external functions, 
and in some cases (e.g., system calls) they may not even be implemented in C. In 
this section, we generalize the soundness theorem to include external functions. 


8 Of course, a C program actually executes by running machine code, but the relation- 
ship between the dry C semantics and the semantics of assembly language is already 
proved in CompCert, as is assembly-to-machine language [20]. 

? Theorem 1 blurs the line between juicy and dry by saying that a dry execution 
“terminates in a state that satisfies its postcondition”, where the postcondition is 
stated in separation logic. In the original proof of soundness [1], this is resolved by 
assuming that the postcondition of main is always true. The techniques we use in 
this section can also be applied to more refined specifications of main. 
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In order to prove correctness of a C program with external calls in our sepa- 
ration logic, we must have a pre- and postcondition I; for each external function. 
At this level these specifications are taken as axioms, since we do not have access 
to the code of the external functions. To be able to describe the dry executions 
of programs that call these functions, we also need simpler specifications on dry 
states. Each dry external specification contains a pre- and postcondition for the 
function, which may refer to the memory state, arguments/return values, the 
external state, and a witness used to provide logical parameters to the pre- and 
postcondition. The core of our approach is to prove the correspondence between 
the juicy specification and the dry specification of each external function. 

If we can relate every juicy specification to a dry specification, then why 
bother with the juicy specifications at all? The answer is, not every function 
can be specified “dry.” Higher-order functions in object-oriented patterns, dy- 
namically created locks with self-referential resource invariants, and many other 
C programming patterns cannot be given simple first-order specifications. But 
the external functions that correspond to ordinary input/output can be given 
first-order specifications. Therefore, users can write higher-order object-oriented 
programs, in which the internal functions have (only) juicy specifications, so long 
as the external functions have (also) dry specifications. For instance, consider 
the specification of the putchars function from the previous section: 


{length(vs) = n A ITree(write_list(vs);; k) x buf > vs} putchars(buf,n) 
{ITree(k) « buf + vs} 


The pre- and postcondition each make one assertion about memory (that the 
buffer buf points to the string of bytes vs) and one assertion about the external 
state! (that the interaction tree allows write_list(vs) followed by k before the 
call, and k afterward). The corresponding first-order specification on dry memory 
and external state is: 


Pre((vs, k), (buf,n),m, z) £ length(vs) = n A z = (write_list(vs);; k) A 
Vi < n. m(buf + i) = vs[i] 
Post((vs, k), (buf, n), mo, m, z) mg =mAz=k 


where (vs, k) is the witness (i.e., the parameters to the specification), buf and 
n are the arguments passed to the function, m is the current memory, z is 
the external state, and mo in the postcondition is the memory before the call 
(allowing us to state that memory is unchanged). Of the roughly 210 Linux 
system calls that are not Linux- or platform-specific, about 140 fall into this 
pattern, including socket, console, and file I/O, memory allocation, or are simpler 
informational calls like gethostname that do not involve memory. 

Once we have a juicy and a dry specification for a given external function, 
what is the relationship between them? Intuitively, if the juicy specification for a 
function f is {P;} f(args); {Q;}, the Hoare logic proof for a program that calls 


10 [Tree is actually an assertion on the external ghost state, which is connected to the 
true external state as described in Section 3, and is erased at the dry level. 
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f guarantees that P; is satisfied before every call to f, and relies on Q; holding 
after each such call returns. To know that the program will run without getting 
stuck, on the other hand, we must know that the dry precondition P4 is satisfied 
before each call, and we can assume that the dry postcondition Qa is satisfied 
after each return. So informally, we need to know that P; implies P4 and that 
Qa implies Q;. This cannot be a simple logical implication, however, because 
Pj and Q; are predicates on juicy memories, while Py and Q4 are predicates on 
dry memories. A juicy memory jm is a dependent triple (m, ¢, pf), where m is 
a dry memory, ¢ is a higher-order, step-indexed memory with ghost state, and 
pf is a proof of the relationship between m and ¢. We can easily extract the dry 
memory m from a juicy memory (we write this as dry(jm)), but there are many 
possible ¢’s that may correspond to a single m: we need to make decisions about 
ownership information and ghost state that is not present at the CompCert level. 

In order to relate the juicy and dry specifications, we must erase the juice from 
the precondition, P; = Py, and then reconstruct the juice in the postcondition, 
Qa = Q;. The key to this erasure is that, as explained above, the Pj and Q; for 
external functions generally make only first-order assertions on memory (memory 
buffers passed to system calls don’t contain higher-order objects such as function 
pointers and locks). The rest of the memory is implicitly the frame, and will not 
be changed by the external call. For first-order predicates, erasure is injective, 
and the associated juicy memory can be uniquely reconstructed once the buffer 
has been modified. The frame can contain noninjective juice, but we can reuse 
the same juice in going from Qa > Q; that we erased in going from P; > Pa, 
since the external function does not modify the frame. In practice, the story is 
not quite so simple: the external function might allocate or free memory, the dry 
witness (used in P4 and Qa) must be derived from the juicy witness (used in P; 
and Qj), and so on. We now formalize the details, culminating in Definition 6, 
the formal correspondence between juicy and dry specifications. 

First, we address the problem of reconstructing a juicy memory from a dry 
memory. While there are many juicy memories that correspond to a given Comp- 
Cert memory, it is easy to start with a (precondition) juicy memory and change it 
to reflect (postcondition) modifications to the associated dry memory, as long as 
those changes fall within certain limits. In particular, a memory location may be 
newly allocated or deallocated, or its value may be changed while staying at the 
same permission level, but its permissions should not otherwise be changed!!. If 
a dry specification ensures that memory is changed in only (at most) these ways, 
we say that it safely evolves memory. When a user adds a new set of external 
functions to VST, this safe evolution property will be one of their proof obliga- 
tions. As long as an external function satisfies a specification that safely evolves 
memory, we can always reconstruct the juicy memory after the call by modify- 
ing the original juicy memory to reflect the changes to the dry memory. This 


11 Any function that interacts with memory through the standard interface of load, 
store, alloc, and free will fall within these limits; concurrency operations, such as 
acquiring or releasing a lock, may not, and proving that lock operations are correctly 
implemented is outside the scope of this work. 
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reconstruction captures the effects of the external call on the program’s memory; 
to reflect the changes to the external state, we must also set the external ghost 
state of the reconstructed juicy memory to match the external state returned 
by the call. We define a reconstruct operation such that reconstruct(jm, m, z) is 
a version of the juicy memory jm that has been modified to take into account 
the changes in the dry memory m and the external state z. 

Second, we need a way to transform a juicy witness into the corresponding 
dry witness. When a user adds a new external call to VST, they must provide a 
dessicate function that performs this transformation. Fortunately, the dessicate 
operation usually follows a simple pattern. Components of the witness that are 
not memory objects are generally identical in their juicy and dry versions. The 
frame is usually the only memory object in the juicy witness; while it is possible in 
VST to write a Hoare triple that quantifies over other memory objects explicitly, 
it is very unusual and runs counter to the spirit of separation logic. Similarly, the 
postcondition of the dry specification may refer to the memory state before the 
call (to express properties such as “this call stored value v at location ¢”), but 
there is rarely a reason to refer to any other memory object. Thus, the dessicate 
operation for each function can simply discard the frame (juicy) memory and 
replace it with the dry memory from before the call. This standard dessicate 
operation works for all external functions shown in this paper. 

This leads to the following definition and theorem: 


Definition 6 (Juicy-Dry Correspondence). A juicy specification (Pj, Q;) 
and a dry specification (Pa,Qa) for an external function correspond if, for a 
suitable dessicate operation: 


— for all witnesses w, arguments a, external states z, and juicy memories jm, 
if P;(w,a,z,jm), then Pa(dessicate(jm, w), a, z, dry(ym)); and 

— for all witnesses w, arguments a, return values r, external states z, ini- 
tial juicy memories jmo, initial external states zo, and dry memories m, if 
Pa(dessicate(jmo, w), a, zo, dry(jmg)) and Qa(dessicate(jmy, w), r, z,m), then 
Q;(w, 7, z, reconstruct(jm,,m, z)). 


Theorem 2 (VST Soundness with External Functions). Let P be a pro- 
gram with n functions, calling also upon m external functions. The internal 
functions have (juicy) specifications Tı...» and the external functions have 
(juicy) specifications In41..-Pntm. Suppose P is proved correct in Verifiable 
C—there is a derivation D+ Py: Iy,...,Pr: Tn. Let Dn4i,.--,;Dn+m be dry 
specifications that safely evolve memory, and that correspond to In41.--In+m- 
Then the main function of P can run according to the CompCert C semantics, 
using D as the semantics of external function calls, for any number of steps 
without getting stuck, and if it terminates then it satisfies its postcondition. 


Proof. We extend the juicy semantics of Theorem 1 with a rule for external 
calls that uses their juicy pre- and postconditions, and then prove that execu- 
tions in this semantics erase to safe executions in the dry semantics, using the 
correspondence to relate juicy and dry behaviors of external calls. 
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Although this theorem does not explicitly mention external communication, 
it implies that any I/O operations performed by P conform to the description of 
allowed communication in the specification of main. This follows from the fact 
that only external calls can change the external state, and only external calls can 
communicate with the outside world. Thus, if P performs a sequence of external 
function calls fi,..., fn, the external communication performed by P must be 
consistent with the specifications Dy,,...,D,,. In the case of the examples above, 
this means that at any point in a program’s execution, its communication so far 
will be a prefix of the operations allowed by the initial |Tree predicate, as desired. 

Proving the correspondence between the juicy and dry specifications is the 
primary proof burden for a VST user who wants to use a new external function 
in their program. Fortunately, this proof only needs to be done once per external 
function rather than once per program (as long as the original specification is 
general enough to be usable in many different programs), and soundness (Theo- 
rem 2) has been proved once and for all. As a result, a VST user can prove that 
their program with external calls runs correctly as follows: 


1. For each external function used in the program (that has not already been 
specified in VST), write a separation logic specification for that function. 

2. Prove correctness of the program in VST as usual using the separation-logic- 
level external specifications. 

3. For each external function used in the program (again, that has not already 
been specified), write a dry specification describing its effects on CompCert 
memories, and prove that the dry specification corresponds to the juicy spec- 
ification and safely evolves memory. 

4. Show immediately that the program runs correctly for any number of steps 
by applying Theorem 2. 


For instance, we have already seen the VST-level specifications for putchars 
and getchars, and used them to prove correctness of a simple program; we can 
complete the process with the following lemma. 


Lemma 1. The juicy specifications of putchars and getchars correspond to their 
dry specifications. 


As a result, we now know that the sample program in Figure 7 runs correctly for 
any implementation of putchars and getchars that satisfy their dry specifications. 


6 Connecting VST to CertikKOS 


In the previous section, we showed how to connect a step-indexed separation logic 
specification of an external function to a “dry” specification on non-step-indexed 
CompCert memories and external state. This gives us a correctness property for 
C programs with external functions, but it still treats the dry specifications of 
the external functions as axioms. In this section, we show how to discharge these 
axioms by connecting dry specifications to implementations of the corresponding 
functions in the verified operating system CertiKOS [7]. 
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Definition serial_in (port : Z) (st : OSState) : OSState * Z := 
(* read buffers, compare bits, etc *) 

let new := st.(serial_oracle) st.(serial_trace) in 

match new with 

| SerialRecv data > 
let (st’, byte) := ... in (* manipulate data *) 
(st’/[serial_trace := st.(serial_trace) ++ [new]], byte) 

| ... (* handle other events *) end. 


Fig. 8: A specification of a serial driver 


6.1 CertiKOS Specifications 


In order to explain how to connect VST and CertiKOS specifications, we first 
summarize how their specification styles differ. In VST, a specification is a pre- 
and postcondition on the (step-indexed, ghost-state-augmented) memory state 
of a program. In CertiKOS, a specification is a function representing a state 
transition from the current OS state to a new one with an (optional) return value. 
The OS state is a record with fields for each piece of concrete or logical state that 
CertiKOS maintains, such as page table maps and console buffers. Specifications 
are organized into “Certified Abstraction Layers” [6], which can be independently 
proven to refine higher-level abstractions, and then composed with other layers 
to build more complex systems. The concrete CertiKOS kernel implementation, 
in C and assembly, is verified with respect to high-level specifications using this 
layer framework and the CompCert compiler. 


Because the specifications are pure, deterministic functions, something more 
is needed to model functions with externally visible effects such as I/O. To 
handle such functions, CertiKOS parameterizes specifications by “environment 
contexts” [8], which act as oracles that take a log of the events up to that point 
and return the next steps taken by the environment. Each oracle has a fixed set 
of events it can produce, along with a trace well-formedness invariant that it 
must preserve. For example, the oracle for modeling the behavior of the serial 
device can return events indicating the successful completion of a send or the 
arrival of some data, and it is assumed to only receive values that fit in a byte 
([0, 255]). Although any particular choice of oracle is a deterministic function, its 
implementation is completely opaque to the specification, so that proofs about 
the specification’s behavior hold given any oracle and environment state. 


As a concrete example, consider the abridged specification of part of the 
serial driver in CertiKOS (Figure 8). After some initial work, the specification 
needs to know what bits came in from the physical device, so it consults the 
oracle and branches based on the next serial event. If the next event is a receive, 
it manipulates the received data to extract a byte and returns it along with a 
new state in which the trace is updated to include the processed event. 
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6.2 Relating OS and User State 


Definition serial_putc (c : Z) (st : OSState) : option (OSState * Z) := 
let c? := c mod 256 in 
if st.(ikern) && st.(init) && st.(ihost) then 
if st.(drv_serial).(serial_exists) then 
match st.(com1) with 
| mkDevData (mkSerialState _ true _ _ txbuf nil false) _ ltx _ > 
let cs := if c’ =? CHAR_LF then [CHAR_LF;CHAR_CR] else [c’] in 
Some (st/[com1/s/TxBuf := cs, 
serial_log := st.(serial_log) ++ [IOEvPutc c]], c?) 
| _ => None end 
else Some (st, -1) 
else None. 


Pre(k, c,m,z) £ (write(c);; k) Gz 
Post(k,c,mo,m,z) =mo =mAzCk 


Fig. 9: The core of the putchar system call vs. its dry specification 


User-level programs cannot directly interact with the outside environment, 
and must instead communicate through the OS using the system call interface 
it provides. System calls in CertiKOS are specified just like any other operation, 
i.e., as a state transition function. For each system call, we would like to relate its 
dry pre- and postcondition (as described in Section 5) to its functional specifica- 
tion in CertiKOS. The property we would like to prove is something like: for any 
initial state s, if the dry precondition holds for s, then the value v and state s' 
returned by the functional specification satisfy the dry postcondition. Combined 
with the correspondence between juicy and dry specifications, this implies that 
the system call specification correctly implements the behavior expected by the 
user program (as expressed by its separation logic specification in VST). How- 
ever, this property cannot be proven in its current form because the dry pre- 
and postconditions are predicates on CompCert memories and external state, 
which differ from CertiKOS’s state, much of which is invisible and irrelevant 
to the user program, as can be seen in Figure 9. Instead, we must restate the 
correctness property in terms of relations between the common elements of the 
two state representations. The key components to relate are the return value of 
the system call, the representation of the user program’s memory, and the model 
of external behaviors. The return value is a CompCert value in both systems, 
but the other two require additional work to translate between them. 

Although, like VST, the CertikOS kernel uses the CompCert C semantics 
and memory model, user-process memory is represented as a flat physical ad- 
dress space rather than a set of disjoint blocks. The OS state also includes page 
tables to map virtual to physical addresses and a record of which addresses are 
allocated. Fortunately, aside from these differences, the flat memory model is 
quite similar to CompCert’s (see Figure 10). We assume the existence of a re- 
lation Rmem that maps blocks to virtual addresses. Other than the restriction 
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Inductive flatmem_val := Inductive memval := 
| HUndef | Undef: memval 
| HByte: byte —flatmem_val. | Byte: byte — memval 


| Pointer: block int —> nat —memval. 
(* Map from address to value *) (* Map from block and offset to value *) 


Definition flatmem := Record mem := mkmem { 
ZMap.t flatmem_val. mem_contents: PMap.t (ZMap.t memval); 
e Fo 


Fig. 10: A comparison of CertiKOS flat memory and CompCert memory 


that blocks fit in the virtual address space and map to nonoverlapping regions, 
the exact mapping has no effect on the system call correctness, so it can be com- 
pletely arbitrary. To relate a CompCert memory to a CertiKOS one, we define 
a relation inj(m, flat(s), ptbl(s)), which states that if a block and offset in the 
CompCert memory m is valid, then it contains the same data as the correspond- 
ing location (according to Rmem and the page table) in the flat memory of the 
OS state s. Note that inj is parameterized by the page table to allow a system 
call to alter the address mapping, for example by allocating new memory. 

At the user level, the precondition contains an interaction tree (or similar 
external specification) that specifies the allowed external behaviors, and the 
postcondition contains a smaller tree that continues using the return value of 
the “consumed” actions. On the other hand, in CertiKOS, specifications begin 
with a trace of the events that have already happened and extend it with new 
events by querying the external environment. To reconcile these two views, we 
can first relate an interaction tree to a (possibly infinite) set of (possibly infinitely 
long) traces, each of which intuitively is the result of following one path in the 
tree. Then any trace allowed by the output interaction tree should be a suffix of 
a trace allowed by the input tree, and the difference between the two should be 
exactly the trace of events generated during the system call: 


Definition 7. We write consume(7, 7’, tr) to mean that, if tr’ is a trace of T’, 
then tr ++ tr’ (concatenation of tr and tr’) is a trace of T. 


Equipped with the relations defined above, we can define more precisely what 
it means for a system call to satisfy its dry specification. 


Definition 8 (Dry-Syscall Correspondence). A system call f with func- 
tional specification Oy correctly implements a dry specification (Pa,Qa) if for 
any arguments v, CompCert memory m, interaction tree T, and OS state s, if 
Pa(v,m,T), inj(m, flat(s), ptbl(s)), and Of(v, s) = (s',u', thew), then for all m’ 
such that inj(m’, flat(s’), ptbl(s’)), there exists T” such that consume(T, T’, tnew); 
and Qa(v,v',m',T"). 


That is, if f correctly implements a dry specification then for any state that 
satisfies the dry precondition P4, we can inject the relevant piece of memory 
into an OS state s, apply the functional specification Oş, and then extract a 
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resulting state that satisfies the dry postcondition Qa. The inj relation may 
relate multiple CompCert memories to a given OS state (hence the universal 
quantification over the resulting memory m’), but all such memories must agree 
on the contents of all valid addresses, so the postcondition will usually hold for 
all m’ if it holds for any m’. 


Theorem 3. Putchar and getchar in CertiKOS correctly implement their dry 
specifications. 


While this correspondence is specific to CertiKOS, we can adapt it to other 
verified operating systems by replacing the CertiKOS system call specification, 
user memory model, and external event representation with those of the other 
OS. For example, in the case of the seL4 microkernel [12], inj could be redefined to 
relate a CompCert memory to certain capability slots that represent the virtual 
memory, and the system call might send a message to a device driver running 
in another process. Despite these changes, most of the theorems in this paper 
aside from Theorem 3 would continue to hold with minor or no alterations. 


6.3 Soundness of VST + CertikOS 


In Section 5, we described a correspondence between “juicy” separation logic 
specifications for external functions and “dry” CompCert-level specifications 
that is sufficient to guarantee that verified C programs behave correctly when 
run, as long as the external functions actually satisfy their dry specifications. 
Now we have seen how to prove that an external function satisfies its dry specifi- 
cation, by relating it to its CertiKOS specification. We combine these two proofs 
to get a stronger correctness property for programs that use CertiKOS system 
calls. This will also allow us to formalize the idea that at each point in a pro- 
gram’s execution, it has performed some prefix of the communication operations 
specified in its precondition. 

First, we define the semantics of programs with respect to the implementation 
of external functions: 


Definition 9 (OS Safety). Suppose that we have a set of external calls F 
such that each f € F has a functional specification Of. Then a configura- 
tion (c,m,t,7), where c is a C program state, m is a memory, t is a trace 
of events performed so far, and T is an interaction tree specifying the allowed 
future events, is safe for n steps with respect to a set of traces T if: 


— nis 0 and T is {e}, or 

— (c,m) + (c,m’) and (c,m’,t,T) is safe for n— 1 steps with respect to T, 
or 

— cis at a call to an external function f with arguments v, and for all s con- 
sistent with t such that inj(m, flat(s), ptbl(s)), if O¢(v, s) = (s’, v’, thew), then 
there is some new interaction tree T’ such that (c',m’',t++tnew,7’) is safe 
for n—1 steps with respect to T’, where c' is the program state after the call 
(using the return value v' ), inj(m’, flat(s’), ptbl(s’)), and consume(T, T”, tnew); 
and T is the union of {tnew +++ | t € T’} for all such T". 
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The C program has states (c,m), where c holds the values of local variables 
and the control stack, and m is the memory. Our small-step relation (c, m) > 
(c',m’) characterizes internal C execution, and therefore if c is at a call to an 
external function then (c, m) Æ (c', m’). The operating system has states s that 
contain the physical memory flat(s) and many other components used internally 
by the OS (and its proof of correctness), including a trace of past events; we say 
that s is consistent with t when the trace in s is exactly t. 

Definition 9 has several important differences from our original definition of 
safety in Section 2. First, configurations include the trace t of events performed 
so far, as well as 7, the high-level specification of the allowed communication 
events (here it is taken to be an interaction tree, but it could easily be defined 
in another formalism just by changing the definition of consume). Second, our 
external functions are not simply axiomatized with pre- and postconditions, 
but implemented by the executable specifications Oş provided by the operating 
system. We use the ideas of the previous section to relate the execution of C 
programs to the behavior of system calls: we inject the user memory into the OS 
state, extract the resulting memory from the resulting state, and require that the 
new interaction tree 7” reflect the communication events tnew performed by the 
call. Note the quantification over the current OS state s: the details of the OS 
state, such as the buffer of values received, are unknown to the C program (and 
may change arbitrarily between steps, for instance, if an interrupt occurs), and 
so it must be safe under all possible OS states consistent with the events t. The 
set T contains all possible communication traces from the program’s execution, 
so by proving that every trace in T is allowed by the initial interaction tree 7, 
we show that the program’s communication is always constrained by 7. 


Lemma 2 (Trace Correctness). If (c,m,7) is safe for n steps with respect 
to T, then for all traces t € T, there exists some interaction tree T' such that 


consume(7 , 7’,t). 


Proof. By induction on n. Since the consume relation holds for the trace segment 
produced by each external call, it suffices to show that it is transitive, i.e., that 
consume(a, b, t1) and consume(b, c, t2) imply consume(a, c, tı ++ t2). 


Theorem 4 (Soundness of VST + CertiKOS). Let P be a program with 
n functions, calling also upon m external functions. The internal functions have 
(juicy) specifications I,...I, and the external functions have (juicy) specifi- 
cations [y,41..-Intm. Suppose P is proved correct in Verifiable C with initial 
interaction tree T. Let Dn4i,---;Dn4m be dry specifications that safely evolve 
memory, and that correspond to Iy,41.-.Intm- Further, let each D; be correctly 
implemented by an OS function fi with executable specification Of,. Then for all 
n, the main function of P is safe for n steps with respect to some set of traces 
T, and for every trace t € T, there exists some interaction tree T’ such that 


consume(7 , 7’,t). 


Proof. By the combination of the soundness of VST with external functions 
(Theorem 2), Lemma 2, and a proof relating our previous definition of safety to 
the new definition. 
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This is our main result: by combining the results of the previous sections, we 
obtain a soundness theorem down to the operating system’s implementation of 
system calls, one that guarantees that the actual communication operations per- 
formed by the program are always a prefix of the initial specification of allowed 
operations. By instantiating the theorem with a set of verified system calls, we 
obtain a strong correctness result for our VST-verified programs, such as: 


Theorem 5. Let P be a program that uses the putchar and getchar system calls 
provided by CertiKOS, such as the one in Figure 4. Suppose P is proved correct 
with initial interaction tree T. Then for all n, the main function of P is safe 
for n steps with respect to some set of traces T, and for every trace t € T, there 
exists some interaction tree T’ such that consume(T, 7’, t). 


7 From syscall-level to hardware-level interactions 


Thus far, we have assumed that the events in a program’s trace are exactly 
the events described in the user-level interaction tree 7. In practice, however, 
the communication performed by the OS may differ from that observed by the 
user. For example, like all operating systems, CertiKOS uses a kernel buffer of 
finite size to store characters received from the serial device; if the buffer is 
full, incoming characters are discarded without being read. To represent this 
distinction, we distinguish between the user-visible events produced by system 
calls, and external events, which are generated by the environment oracle and 
recorded in the trace at the time that they occur. For the system call events 
to be meaningful, they must correspond in some way to the external events, 
but this correspondence may not be one-to-one. In the case of console I/O, each 
character received by the serial device should be returned by getchar at most 
once, and in the order they arrived, but characters may be dropped. This leads us 
to the condition that the user events should be a subsequence of the environment 
events, which is proved in CertiKOS. 


Lemma 3. The getchar system call maintains the invariant that there exists an 
injective map from a system call event with value v in the OS trace to an external 
event with value v earlier in the trace. 


Corollary 2. Let P be a verified program as described in Theorem 4, in which 
getchar is the only system call performed. Then for all n, the main function of 
P is safe for n steps with respect to some set of traces T, and for every trace 
t E T, there exists some interaction tree T’ such that consume(7,7’,t), and the 
events in t correspond to external events performed as described in Lemma 3. 


Unlike Theorem 4, this corollary is specific to a particular system call, but it 
gives a stronger correctness property: the events in the user-level interaction tree 
are now interpreted in terms of actual bytes received by the OS, in the form of 
external events. Note that Lemma 3 does not require that every external event 
has a corresponding system call event; if the buffer fills up and characters are 
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dropped before a getchar call, then there will be external events that do not cor- 
respond to anything in the interaction tree, and this is the intended semantics of 
buffered communication without flow control. A similar corollary can be proved 
for any set of system calls, but the precise correspondence between user events 
and external events will depend on the particular system calls involved. 

There is one more soundness theorem we might want to prove, asserting 
that the combined system of program and operating system executes correctly 
according to the assembly-level semantics of the OS. We should be able to obtain 
this theorem by connecting Theorem 4 with the soundness theorem of CertiKOS, 
which guarantees that the behavior of the operating system running a program 
P refines the behavior of a system K œx P consisting of the program along 
with an abstract model of the operating system. However, this connection is 
far from trivial: it involves lowering our soundness result from C to assembly 
(using the correctness theorem of CompCert), modeling the switch from user to 
kernel mode (including the semantics of the trap instruction), and considering 
the effects of other OS features on program behavior (e.g., context switching). We 
estimate that we have covered more than half of the distance between VST and 
CertiKOS with our current result, but there is still work to be done to complete 
the connection. We can now remove the OS’s implementation of each system call 
from the trusted computing base; it remains to remove the OS entirely. 


8 Related Work 


The most comprehensive prior work connecting verified programs to the imple- 
mentation of I/O operations is that of Férée et al. [5] in CakeML, a functional 
language with I/O connected to a verified compiler and verified hardware. As in 
our approach, the language is parameterized by functional specifications for ex- 
ternal functions, backed by proofs at a lower level. However, while CakeML does 
support a separation logic [9], it is not higher-order, so all of the components are 
specified in the same basic style. Our approach could enable higher-order sepa- 
ration logic reasoning about CakeML programs. Ironclad Apps [10] also includes 
verified communicating code, for user-level networking applications running on 
the Verve operating system [21]. However, their network stack is implemented 
outside of the operating system, so proofs about I/O operations are carried out 
within the same framework as the programs that use the operations. 

One major category of system calls is file I/O operations. The FSCQ file 
system [2] is verified using Crash Hoare Logic, a separation logic which accounts 
for possible crashes at any point in a program. File system assertions are similar 
to the ordinary points-to assertions of separation logic, but may persist through 
crashes while memory is reset. In Crash Hoare Logic, the implementation-level 
model of the file state is the same as the user’s model, and the approach does 
not obviously generalize to other forms of external communication. 

Another related area is the extension of separation logic to distributed sys- 
tems, which necessarily involves reasoning about communication with external 
entities. The most closely related such logic is Aneris [14], which is built on 
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Iris, the inspiration for VST’s approach to ghost state. The adequacy theorem 
of Aneris proves the connection between higher-order separation logic specifica- 
tions of socket operations and a language that includes first-order operational 
semantics for those functions. In our approach, this would correspond to directly 
adding the “dry” specifications for each operation to the language semantics, and 
building the correspondence proof for those particular operations into the sound- 
ness theorem of the logic; our more generic style of soundness theorem would 
make it easier to plug in new external calls. The bottom half of our approach— 
showing that the language-level semantics of the operations are implemented by 
an OS such as CertikKOS—could be applied to Aneris more or less as is. Another 
interesting feature of Aneris is that the communication allowed on each socket 
is specified by a user-provided protocol, an arbitrary separation logic predicate 
on messages and resources. In our examples thus far, we have assumed that the 
external world does not share any notion of resource with the program, and 
so our external state only mentions the messages to be sent and received; how- 
ever, the construction of Section 3 does allow the external state to have arbitrary 
ghost-state structure, which we could use to define similarly expressive protocols. 


9 Conclusion and Future Work 


We have now seen how to connect programs verified using higher-order separa- 
tion logic to external functions provided by a first-order verified system, effec- 
tively importing the results of outside verification (e.g. OS verification) into our 
separation logic. The approach consists of two halves: we first relate separation 
logic specifications for the external functions to “dry” first-order specifications 
on CompCert memories [15] and interaction trees [13], and then relate these dry 
specifications to the system that implements the functions (CertikKOS in our 
example). In the process, we interpret the C-level communication constraints in 
terms of OS-level events that more accurately represent the communication that 
occurs in the real world. Our approach works for any type of external commu- 
nication, and allows users to extend the system with new external functions as 
needed. Each new correspondence proof for an external function modularly ex- 
tends the soundness theorem of VST, removing the separation-logic specification 
of the function from the trusted computing base. 

The combination of CompCert memories with interaction trees has served 
as a robust specification interface between two quite different approaches to 
verification: VST’s higher-order impredicative concurrent separation logic, and 
CertiKOS’s certified concurrent abstraction layers. This strongly suggests that 
the combination of CompCert memories and interaction trees can serve as a 
lingua franca to interface with other verification systems for client programs 
and for operating systems. 
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Abstract. Bernardy et al. [2018] proposed a linear type system A‘, as a 
core type system of Linear Haskell. In the system, linearity is represented 
by annotated arrow types A +m B, where m denotes the multiplicity 
of the argument. Thanks to this representation, existing non-linear code 
typechecks as it is, and newly written linear code can be used with 
existing non-linear code in many cases. However, little is known about 
the type inference of A“%,. Although the Linear Haskell implementation 
is equipped with type inference, its algorithm has not been formalized, 
and the implementation often fails to infer principal types, especially for 
higher-order functions. In this paper, based on OuTSIDEIN(X) [Vytiniotis 
et al., 2011], we propose an inference system for a rank 1 qualified-typed 
variant of \%,, which infers principal types. A technical challenge in this 
new setting is to deal with ambiguous types inferred by naive qualified 
typing. We address this ambiguity issue through quantifier elimination 
and demonstrate the effectiveness of the approach with examples. 


Keywords: Linear Types - Type Inference - Qualified Typing. 


1 Introduction 


Linearity is a fundamental concept in computation and has many applications. 
For example, if a variable is known to be used only once, it can be freely inlined 
without any performance regression [29]. In a similar manner, destructive updates 
are safe for such values without the risk of breaking referential transparency [32]. 
Moreover, linearity is useful for writing transformation on data that cannot be 
copied or discarded for various reasons, including reversible computation [19,35] 
and quantum computation [2,25]. Another interesting application of linearity is 
that it helps to bound the complexity of programs [1,5, 13] 

Linear type systems use types to enforce linearity. One way to design a 
linear type system is based on Curry-Howard isomorphism to linear logic. For 
example, in Wadler [33]’s type system, functions are linear in the sense that their 
arguments are used exactly once, and any exception to this must be marked by 
the type operator (!). Such an approach is theoretically elegant but cumbersome 
in programming; a program usually contains both linear and unrestricted code, 
and many manipulations concerning (!) are required in the latter and around the 
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interface between the two. Thus, there have been several proposed approaches 
for more practical linear type systems [7,21, 24, 28]. 

Among these approaches, a system called A%,, the core type system of Linear 
Haskell, stands out for its ability to have linear code in large unrestricted code 
bases [7]. With it, existing unrestricted code in Haskell typechecks in Linear 
Haskell without modification, and if one desires, some of the unrestricted code 
can be replaced with linear code, again without any special programming effort. 
For example, one can use the function append in an unrestricted context as 
Ax.tail (append x x), regardless of whether append is a linear or unrestricted 
function. This is made possible by their representation of linearity. Specifically, 
they annotate function type with its argument’s multiplicity (“linearity via 
arrows” |7]) as A >m B, where m = 1 means that the function of the type 
uses its argument linearly, and m = w means that there is no restriction in 
the use of the argument, which includes all non-linear standard Haskell code. 
In this system, linear functions can be used in an unrestricted context if their 
arguments are unrestricted. Thus, there is no problem in using append : List A >, 
List A — List A as above, provided that x is unrestricted. This promotion of 
linear expressions to unrestricted ones is difficult in other approaches [21, 24, 28] 
(at least in the absence of bounded kind-polymorphism), where linearity is a 
property of a type (called “linearity via kinds” in [7]). 

However, as far as we are aware, little is known about type inference for 
A%,. It is true that Linear Haskell is implemented as a fork! of the Glasgow 
Haskell Compiler (GHC), which of course comes with type inference. However, 
the algorithm has not been formalized and has limitations due to a lack of proper 
handling of multiplicity constraints. Indeed, Linear Haskell gives up handling 
complex constraints on multiplicities such as those with multiplications p - q; as 
a result, Linear Haskell sometimes fails to infer principal types, especially for 
higher-order functions.” This limits the reusability of code. For example, Linear 
Haskell cannot infer an appropriate type for function composition to allow it to 
compose both linear and unrestricted functions. 

A classical approach to have both separated constraint solving that works 
well with the usual unification-based typing and principal typing (for a rank 1 
fragment) is qualified typing [15]. In qualified typing, constraints on multiplicities 
are collected, and then a type is qualified with it to obtain a principal type. 
Complex multiplicities are not a problem in unification as they are handled by a 
constraint solver. For example, consider app = Af.Ax.f x. Suppose that f has 
type a —>p b, and x has type a (here we focus only on multiplicities). Let us write 
the multiplicities of f and x as pf and pz, respectively. Since x is passed to f, 
there is a constraint that the multiplicity py of x must be w if the multiplicity p 
of the f’s argument also is. In other words, pẹ must be no less than p, which is 
represented by inequality p < ps under the ordering 1 < w. (We could represent 
the constraint as an equality py = p- Px, but using inequality is simpler here.) 


1 https: //github.com/tweag/ghc/tree/linear-types 
? Confirmed for commit 1¢80dcb424e1401£32bf7436290dd698c739d906 at May 14, 
2019. 
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For the multiplicity py of f, there is no restriction because f is used exactly once; 
linear use is always legitimate even when py = w. As a result, we obtain the 
inferred type Vp pj px ab. p < Ps > (a —>p b) >p, a >p, b for app. This type is 
a principal one; it is intuitively because only the constraints that are needed for 
typing Af.Av.f x are gathered. Having separate constraint solving phases itself 
is rather common in the context of linear typing [3, 4, 11,12, 14, 23, 24, 29, 34]. 
Qualified typing makes the constraint solving phase local and gives the principal 
typing property that makes typing modular. In particular, in the context of 
linearity via kinds, qualified typing is proven to be effective [11, 24]. 

As qualified typing is useful in the context of linearity via kinds, one may 
expect that it also works well for linearity via arrows such as \%,. However, naive 
qualified typing turns out to be impractical for A4, because it tends to infer 
ambiguous types [15,27]. As a demonstration, consider a slightly different version 
of app defined as app’ = Af.Av.app f x. Standard qualified typing [15,31] infers 
the type 


Vad? de Pf Px Ob. (q < de Nagy < Dp ^de < Pz) > (a 4g b) >p; a >p, D 
by the following steps: 


— The polymorphic type of app is instantiated to (a +4 b) +4, @ >q, b and 
yields a constraint q < qx (again we focus only on multiplicity constraints). 

— Since f is used as the first argument of app, f must have type a —>q b. Also, 
since the multiplicity of app’s first argument is qr, there is a restriction on 
the multiplicity of f, say pr, that qf < pr. 

— Similarly, since x is used as the second argument of app, x must have type a, 
and there is a constraint on the multiplicity of x, say pz, that qx < pe. 


This inference is unsatisfactory, as the inferred type leaks internal details and 
is ambiguous [15,27] in the sense that one cannot determine gy and qr from 
an instantiation of (a +, b) +p; a >p, b. Due to this ambiguity, the types of 
app and app’ are not judged as equivalent; in fact, the standard qualified typing 
algorithms [15,31] reject app’ : Vp pf px ab. p < pr = (a —>p b) >p; a >p, b. We 
conjecture that the issue of inferring ambiguous types is intrinsic to linearity via 
arrows because of the separation of multiplicities and types, unlike the case of 
linearity via kinds, where multiplicities are always associated with types. Simple 
solutions such as rejecting ambiguous types are not desirable as this case appears 
very often. Defaulting ambiguous variables (such as qf and qx) to 1 or w is not a 
solution either because it loses principality in general. 

In this paper, we propose a type inference method for a rank 1 qualified-typed 
variant of A%,, in which the ambiguity issue is addressed without compromising 
principality. Our type inference system is built on top of OUTSIDEIN(X) [31], 
an inference system for qualified types used in GHC, which can handle local 
assumptions to support let, existential types, and GADTs. An advantage of using 
OUTSIDEIN(X) is that it is parameterized over theory X of constraints. Thus, 
applying it to linear typing boils down to choosing an appropriate X. We choose 
X carefully so that the representation of constraints is closed under quantifier 
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elimination, which is the key to addressing the ambiguity issue. Specifically, in 
this paper: 


— We present a qualified typing variant of a rank-1 fragment of A1, without 
local definitions, in which manipulation of multiplicities is separated from 
the standard unification-based typing (Sect. 2). 

— We give an inference method for the system based on gathering constraints 
and solving them afterward (Sect. 3). This step is mostly standard, except 
that we solve multiplicity constraints in time polynomial in their sizes. 

— We address the ambiguity issue by quantifier elimination under the assumption 
that multiplicities do not affect runtime behavior (Sect. 4). 

— We extend our technique to local assumptions (Sect. 5), which enables let 
and GADTs, by showing that the disambiguation in Sect. 4 is compatible 
with OUTSIDEIN(X). 

— We report experimental results using our proof-of-concept implementation 
(Sect. 6). The experiments show that the system can infer unambiguous 
principal types for selected functions from Haskell’s Prelude, and performs 
well with acceptable overhead. 


Finally, we discuss related work (Sect. 7) and then conclude the paper (Sect. 8). 
The prototype implementation is available as a part of a reversible programming 
system SPARCL, available from https: //bitbucket.org/kztk/partially-reversible-lang-impl/. 
Due to space limitation, we omit some proofs from this paper, which can be 
found in the full version [20]. 


2 Qualified-Typed Variant of A4, 


In this section, we introduce a qualified-typed [15] variant of A%, [7] for its 
rank 1 fragment, on which we base our type inference. Notable differences to the 
original AY, include: (1) multiplicity abstractions and multiplicity applications 
are implicit (as type abstractions and type applications), (2) this variant uses 
qualified typing [15], (3) conditions on multiplicities are inequality based [6], 
which gives better handling of multiplicity variables, and (4) local definitions 
are excluded as we postpone the discussions to Sect. 5 due to their issues in the 
handling of local assumptions in qualified typing [31]. 


2.1 Syntax of Programs 

Programs and expressions, which will be typechecked, are given below. 
prog ::= bind,;...; bind, 
bind := f=e|f:A=e 
e u=a2| Azx.e | e1 e2 | C E | case eo of {C; Ti > e;}; 


A program is a sequence of bindings with or without type annotations, where 
bound variables can appear in following bindings. As mentioned at the beginning 
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A, B ::= Vpa.Q >T (polytypes) Q n= A, oi (constraints) 
oT == a|DpET|o-, T (monotypes) (o) := M < M' (predicates) 
po =pli] o (multiplicities) M,N ::= [[;p: (multiplications) 


Fig. 1. Types and related notions: a and p are type and multiplicity variables, respec- 
tively, and D represents a type constructor. 


of this section, we shall postpone the discussions of local bindings (i.e., let) to 
Sect. 5. Expressions consist of variables x, applications e1 e2, A-abstractions Ax.e, 
constructor applications C €, and (shallow) pattern matching case eo of {C; Ti > 
e;};. For simplicity, we assume that constructors are fully-applied and patterns 
are shallow. As usual, patterns C; 7; must be linear in the sense that each variable 
in z; is different. Programs are assumed to be appropriately a-renamed so that 
newly introduced variables by À and patterns are always fresh. We do not require 
the patterns of a case expression to be exhaustive or no overlapping, following 
the original A4, [7]; the linearity in %, cares only for successful computations. 
Unlike the original A%,, we do not annotate À and case with the multiplicity of 
the argument and the scrutinee, respectively. 

Constructors play an important role in A%,. As we will see later, they can be 
used to witness unrestrictedness, similarly to ! of !e in a linear type system [33]. 


2.2 Types 


Types and related notations are defined in Fig. 1. Types are separated into 
monotypes and polytypes (or, type schemes). Monotypes consist of (rigid) type 
variables a, datatypes D #7, and multiplicity-annotated function types T4 +, T2. 
Here, a multiplicity p is either 1 (linear), w (unrestricted), or a (rigid) multiplicity 
variable p. Polytypes have the form Vpa.Q => T, where Q is a constraint that 
is a conjunction of predicates. A predicate ¢ has the form of M < M’, where 
M' and M are multiplications of multiplicities. We shall sometimes treat Q as 
a set of predicates, which means that we shall rewrite Q according to contexts 
by the idempotent commutative monoid laws of A. We call both multiplicity (p) 
and type (a) variables type-level variables, and write ftv(t) for the set of free 
type-level variables in syntactic objects (such as types and constraints) f. 

The relation (<) and operator (-) in predicates denote the corresponding 
relation and operator on {1,w}, respectively. On {1,w}, (<) is defined as the 
reflexive closure of 1 < w; note that ({1,w} , <) forms a total order. Multiplication 
(-) on {1,w} is defined by 


1-m=m-l=m w- M=M:W =W. 


For simplicity, we shall sometimes omit (-) and write mimo for mı : m2. Note 
that, for m1, M2 € {1,w}, Mı -Mo is the least upper bound of mı and mz with 
respect to <. As a result, mı -Mm < m holds if and only if (mı < m) A (m2 < m) 
holds; we will use this property for efficient handling of constraints (Sect. 3.2). 
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We assume a fixed set of constructors given beforehand. Each constructor 
is assigned a type of the form Vpa. Ti >; -+ yn, Tr yun D P @ where 
each 7; and yz; do not contain free type-level variables other than {pa}, i.e., 
U; ftv(Ti, ui) C {pa}. For simplicity, we write the above type as Ypa. T >y D P T. 
We assume that types are well-kinded, which effectively means that D is applied 
to the same numbers of multiplicity arguments and type arguments among the 
constructor types. Usually, it suffices to use constructors of linear function types 
as below because they can be used in both linear and unrestricted code. 


(—,-) : Vab. a —>ı b >i a&b 
Nil : Va. List a Cons : Va. a >, List a 4, List a 


In general, constructors can encapsulate arguments’ multiplicities as below, 
which is useful when a function returns both linear and unrestricted results. 


MkUn : Va. a > n Una MkMany : Ypa. a >p Many pa 


For example, a function that reads a value from a mutable array at a given 
index can be given as a primitive of type readMArray : Va. MArray a >, Int >u 
(MArray a ® Un a) [7]. Multiplicity-parameterized constructors become useful 
when the multiplicity of contents can vary. For example, the type lOr pa with 
the constructor MklO;, : (World +, (World @ Many p a)) —>ı IOL p a can 
represent the IO monad [7| with methods return : Vpa. a —, lO, p a and 
(œ=) : Ypqab. lOr pa —>ı (a >p IOL q b) 41 IOL q b. 


2.3 Typing Rules 


Our type system uses two sorts of environments A typing environment maps 
variables into polytypes (as usual in non-linear calculi), and a multiplicity envi- 
ronment maps variables into multiplications of multiplicities. This separation of 
the two will be convenient when we discuss type inference. As usual, we write 


21: Á1,..., Zn : An instead of {a1 > Á1,..., En ++ An} for typing environments. 


For multiplicity environments, we use multiset-like notation as x1% ,..., £n %”. 
We use the following operations on multiplicity environments:? 
(Ar + Ay) (2) = w 5 x € dom(41) N dom(42) o 
A;lx) if x E€ dom(4;)\dom(4;) (i 4 j € {1, 2}) 
(uA)(z) = u: A(z) 
CHERE A(x) - A(x) A x € dom(41) N dom(42) E 
w if x € dom(A;) \ dom(4;) (i #9 € {1, 2}) 


3 In these definitions, we implicitly consider multiplicity 0 and regard A(x) = 0 if 
x ¢ dom(A). It is natural that 0 +m = m + 0. With 0, multiplication -, which is 
extended as 0 -m = m - 0 = 0, no longer computes the least upper bound. Therefore, 
we use L for the last definition; in fact, the definition corresponds to the pointwise 
computation of A(x) U A2(x), where < is extended as 0 < w but not 0 < 1. This 
treatment of 0 coincides with that in the Linear Haskell proposal [26]. 
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Q;T;A’be:7' T(x) =Vpa.Q! > Tr 
QERA=AN’ QErx7’ QE Q'pr g] Ot Ay 
Q;T;AFe:rT R QT; AF z:rtpF p, a T] a 
Oa O A EE 4. QT; AF e:o>,T QT; Az e2:0 
Q; T; AF Awe: 0 Sy T BS Q; I; Ai + uA2F e1 e2:T PP 
C: Ypa. T >r Dpa {Q; 1; 4A;F ei: rip > p,a |y 
ON 


QT wAsot So, rnp m A H Ce: DEF 


Q; I; Ao eo: DET 
Ci : Vpa. Ti >r D pa 
Q; T, ti : TPB, a> a]; Ai, eo" PAF e; : 7! 


Q; T; poo + Ll; Ai H case eo of {C; Ti > ei}: 7’ GASE 


Fig. 2. Typing relation for expressions 


Intuitively, A(x) represents the number of uses of x. So, in the definition of 
Aı + Ao, we have (A; + 42) (£) = w if x € dom(41) N dom(42) because this 
condition means that x is used in two places. Operation ^; U Ag is used for case 
branches. Suppose that a branch e; uses variables as A, and another branch e2 
uses variables as Aj. Then, putting the branches together, variables are used 
as A, U Ao. The definition says that x is considered to be used linearly in the 
two branches put together if and only if both branches use «x linearly, where 
non-linear use includes unrestricted use (A; (a) = w) and non-use (x ¢ dom(A)). 
We write Q H Q if Q logically entails Q’. That is, for any valuation of 
multiplicity variables 6(p) € {1,w}, Q’6 holds if Q0 does. For example, we have 
p<rAr<qEp<q. We extend the notation to multiplicity environments 
and write Q = Ay < Ap if dom(A1) C dom(A2) and Q E Ajedomia) 41(2) < 
A2(£) A \edom(Az)\dom(Ay) Y < A2(x) hold. We also write Q = Ai = Ae if both 
QE Ay < Ao and QE 42 < A, hold. We then have the following properties. 


Lemma 1. Suppose Q = A < A’ and QE A= A, + 42. Then, there are some 
A‘ and AS such that QE A’ = A) + 45, QE 4: < AY and QE Ap < AS. 


Lemma 2. Q = pA < A’ implies QE A< Jd. 


Lemma 3. Q E A, U Ao < A’ implies QE A; < A’ and QE 42 < Æ. 


Constraints Q affect type equality; for example, under Q =p <q^q< p, 
© >p T and o >q T become equivalent. Formally, we write Q = 7 ~ 7’ if 
T = 7'0 for any valuation 0 of multiplicity variables that makes Q9 true. 

Now, we are ready to define the typing judgment for expressions, Q; T; Af e: 
T, which reads that under assumption Q, typing environment I’, and multiplicity 
environment A, expression e has monotype 7, by the typing rules in Fig. 2. Here, 
we assume dom(A) C dom(I’). Having x € dom(J’) \ dom(A) means that the 
multiplicity of x is essentially 0 in e. 

Rule EQ says that we can replace 7 and A with equivalent ones in typing. 
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Q;l;Abe:7r pa=ftv(Q,7) I, f:Vpa.Q>TF prog 
Pre EMPTY TE fep BIND 
Q;l;Abre:r pa=ft(Q,T) Isf Voa Q Tk proi 
TF f : (Vpa.Q > T) = e; prog 


NDA 


Fig. 3. Typing rules for programs 


Rule VAR says that x is used once in a variable expression x, but it is safe to 
regard that the expression uses x more than once and uses other variables w times. 
At the same time, the type Vpa.Q’ => T of x instantiated to rp > g, a> | with 
yielding constraints Q’[p> ji], which must be entailed from Q. 

Rule ABs says that Aw.e has type o >, T if e has type 7, assuming that 
the use of x in e is u. Unlike the original 4, [7], in our system, multiplicity 
annotations on arrows must be p, i.e., 1, w, or a multiplicity variable, instead of 
M. This does not limit the expressiveness because such general arrow types can 
be represented by type o +, T with constraints p< MAM < p. 

Rule App sketches an important principle in \%,; when an expression with 
variable use A is used -many times, the variable use in the expression becomes 
pA. Thus, since we pass e> (with variable use Az) to e1, where e; uses the 
argument p-many times as described in its type 0 +, T, the use of variables in 
e2 of e1 e2 becomes Ag. For example, for (Ay.42) x, x is considered to be used 
w times because (Ay.42) has type o —,, Int for any ø. 

Rule CoN is nothing but a combination of VAR and APP. The wAg part is 
only useful when C is nullary; otherwise, we can weaken A at leaves. 

Rule CASE is the most complicated rule in this type system. In this rule, jug 
represents how many times the scrutinee eg is used in the case. If pọ = w, the 
pattern bound variables can be used unrestrictedly, and if wo = 1, the pattern 
bound variables can be used according to the multiplicities of the arguments of the 
constructor.* Thus, in the ith branch, variables in 7] can be used as povi[p > f], 
where ilp E] represents the multiplicities of the arguments of the constructor 
C;. Other than %;, each branch body e; can contain free variables used as Aj. 
Thus, the uses of free variables in the whole branch bodies are summarized as 
|, A:. Recall that the case uses the scrutinee jo times; thus, the whole uses of 
variables are estimated as oo + LJ; Ai. 

Then, we define the typing judgment for programs, I’ = prog, which reads that 
program prog is well-typed under I’, by the typing rules in Fig. 3. At this place, 
the rules BIND and BINDA have no significant differences; their difference will be 
clear when we discuss type inference. In the rules BIND and BINDA, we assumed 
that I contains no free type-level variables. Therefore, we can safely generalize 
all free type-level variables in Q and 7. We do not check the use A in both rules 


4 This behavior, inherited from X%, [7], implies the isomorphism !(A & B) =!A @!B, 
which is not a theorem in the standard linear logic. The isomorphism intuitively means 
that unrestricted products can (only) be constructed from unrestricted components, 
as commonly adopted in linearity-via-kind approaches [11, 21, 24, 28, 29]. 
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as bound variables are assumed to be used arbitrarily many times in the rest 
of the program; that is, the multiplicity of a bound variable is w and its body 
uses variable as wA, which maps x € dom(A) to w and has no free type-level 
variables. 


2.4 Metatheories 


Lemma 4 is the standard weakening property. Lemma 5 says that we can replace 
Q with a stronger one, Lemma 6 says that we can replace A with a greater one, 
and Lemma 7 says that we can substitute type-level variables in a term-in-context 
without violating typeability. These lemmas state some sort of weakening, and 
the last three lemmas clarify the goal of our inference system discussed in Sect. 3. 


Lemma 4. Q;I; Al e:r implies Q; T,x: A4; AFe:r. 

Lemma 5. Q;I; AF e:r and Q’ HQ implies Q’; T; AF e: T. 
Lemma 6. Q;T; AFe:7 and QE A< A’ implies Q; T; A’ Fe: Tr. 
Lemma 7. Q;I; AF e:r implies Q0; T0; A0 F e: 78. 


We have the following form of the substitution lemma: 


Lemma 8 (Substitution). Suppose Qo; I, £: 0; Ao, £” F e : T, and Qi; I’; A; F 
e; : 0; for each i. Then, Qi A A; Qi; T; Ao + J; pii F eje > e'] : 7. 


Subject Reduction We show the subject reduction property for a simple call-by- 
name semantics. Consider the standard small-step call-by-name relation e —> e’ 
with the following 6-reduction rules (we omit the congruence rules): 


(Ax.e1) e2 — eile = e2] case C; €j of {C; => elyi > e; E; = ez] 
Then, by Lemma 8, we have the following subjection reduction property: 


Lemma 9 (Subject Reduction). Q; l; A F e : 7 and e — e’ implies 
Œr AFET 

Lemma 9 holds even for the call-by-value reduction, though with a caveat. 
For a program fı = €1;...; fn = en, it can happen that some e; is typed 
only under unsatisfiable (i.e., conflicting) Q;. As conflicting Q; means that e; 
is essentially ill-typed, evaluating e; may not be safe. However, the standard 
call-by-value strategy evaluates e;, even when f; is not used at all and thus the 
type system does not reject this unsatisfiability. This issue can be addressed 
by the standard witness-passing transformation |15] that converts programs so 
that Q => T becomes Wg —> T, where Wg represents a set of witnesses of Q. 
Nevertheless, it would be reasonable to reject conflicting constraints locally. 

We then state the correspondence with the original system |7] (assuming the 
modification [6] for the variable case) to show that the qualified-typed version 


5 In the premise of VAR, the original [7] uses JA’. A = zt + wA’, which is modified 
to xt < A in [6]. The difference between the two becomes clear when A(x) = p, for 
which the former one does not hold as we are not able to choose A’ depending on p. 
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captures the linearity as the original. While the original system assumes the 
call-by-need evaluation, Lemma 9 could be lifted to that case. 


Theorem 1. If T;l; A+ e: 7 where I’ contains only monotypes, e is also 
well-typed in the original \%, under some environment. 


The main reason for the monotype restriction is that our polytypes are strictly 
more expressive than their (rank-1) polytypes. This extra expressiveness comes 
from predicates of the form --- < M-M’. Indeed, f = Axv.case x of {MkMany y > 
(y,y)} has type Vpqa. w < p-q = MkMany p a —, a Q a in our system, while it 
has three incomparable types in the original \%,. 


3 Type Inference 


In this section, we give a type inference method for the type system in the 
previous section. Following [31, Section 3], we adopt the standard two-phase 
approach; we first gather constraints on types and then solve them. As mentioned 
in Sect. 1, the inference system described here has the issue of ambiguity, which 
will be addressed in Sect. 4. 


3.1 Inference Algorithm 


We first extend types 7 and multiplicities u to include unification variables. 
Tis: la L= |T 


We call a/r a unification type/multiplicity variable, which will be substituted 
by a concrete type/multiplicity (including rigid variables) during the inference. 
Similarly to ftv(¢), we write fuv(£) for the unification variables (of both sorts) in 
t, where each t; ranges over any syntactic element (such as 7, Q, I’, and A). 

Besides Q, the algorithm will generate equality constraints tT ~ r’. Formally, 
the sets of generated constraints C and generated predicates y are given by 


C= Avi bu=ol|ro7 


Then, we define type inference judgment for expressions, I > e: T~ AC, 
which reads that, given I and e, type 7 is inferred together with variable use A 
and constraints C, by the rules in Fig. 4. Note that A is also synthesized as well 
as T and C in this step. This difference in the treatment of I and A is why we 
separate multiplicity environments A from typing environments I’. 

Gathered constraints are solved when we process top-level bindings. Figure 5 
defines type inference judgment for programs, I!» prog, which reads that the 
inference finds prog well-typed under I’. In the rules, manipulation of constraints 
is done by the simplification judgment Q simp C ~ Q';0, which simplifies 
C under the assumption Q into the pair (Q’,0) of residual constraints Q’ and 
substitution @ for unification variables, where (Q’,6) is expected to be equivalent 
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I(x) =Vpa.Q=7 @,7: fresh Tjc:ahe:7r~A,a™;C a,m: fresh 
Dea: t[peraoa]~ rt; Qp 7] re zte: >rT ACAM<T 
re e:n ~œ A; C2 I P e2: T2 ~œ A2;C1 6,7: fresh 
I H e e2 : 8~ Ai + TA2; C1 A C2 ATi ~ (T2 >r B) 

C: Ypa. >r Dpa {I P ei: ti~ A;Cih &,T : fresh 
re Ce:DTa~ Yo upommai/A,Ci Ati ~ olp rT, am al 


I È eo : To ~œ Ao;Co To, Ti, Qi, 8 : fresh 
T, xi : Tilp > mi,a ai] ® ei : T} ~ Ai, aii; Ci i 


Ou =CoAf, (Ci nB ~ti A(T ~ D Ti ai) A A; Mij < movss[p > m) 


T H> case eo of {C; Ti > ei ji : B ~œ Too + Ll; Ai; C’ 
Fig. 4. Type inference rules for expressions 


PRe: tT AC T Psimp C~Q;0 {7a} = fuv(Q,70) 
p,a:fresh I, f :Vpa.(Q > 70)[at a, r pl > prog 
ree I f = e; prog 


TRe:0~ A&C QPsmp CAT wo ~ T;0 T, f :Ypa.Q => TH prog 
I H f : (Vpa.Q > T) = e; prog 


Fig. 5. Type inference rules for programs 


in some sense to C under the assumption Q. The idea underlying our simplification 
is to solve type equality constraints in C as much as possible and then remove 
predicates that are implied by Q. Rules s-FUN, s-DATA, S-UNI, and S-TRIV 
are responsible for the former, which decompose type equality constraints and 
yield substitutions once either of the sides becomes a unification variable. Rules 
S-ENTAIL and S-REM are responsible for the latter, which remove predicates 
implied by Q and then return the residual constraints. Rule S-ENTAIL checks 
Q | ¢; a concrete method for this check will be discussed in Sect. 3.2. 


Example 1 (app). Let us illustrate how the system infers a type for app = 
Af.Ax.f x. We have the following derivation for its body Af.Ax.f x: 


frag frag ft; T wragha:a_,~ al; T 
f: Af, cia, fa: B~ ft, as ~ (a2 >r p) 
f: ap > Auf E: ar >r, b> fhas ~ (Ar >r B) AT: <7 
> AJ.AT.f ©: AF >r; Az >r, Lœ baf ~ (Ar Gn B) Ane S TALS T 


The highlights in the above derivation are: 


— In the last two steps, f is assigned to type aş and multiplicity 7f, and x is 
assigned to type a, and multiplicity Ty. 
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QM simp o w Aw Sp Aw SUNAT YT ~> QO 
QP simp (0 >, T) ~ (o Sy TAC ~ Q's 


S-FuN 


Q Psim HI AW <pAtcrn ad AC Q'6 
QP simp (DEG) ~ (D W a’) AC ~ Q'3 0 
Q g fuv(r) Q > simp Cla j=} T] aed Q';0 Q simp C ~ Q';0 


Q Poimp a ~ TAC ~ Q';0 ofa 7] S-UNI Q anrr orraa gg EN 


S-DATA 


QAQw Eo Q simp Qw AC ~œ Q';0 S-ENTAIL no other rules can apply SRN 
U rana PA AUA Q Poimp Q ~ QO 


Fig. 6. Simplification rules (modulo commutativity and associativity of \ and commu- 
tativity of ~) 


— Then, in the third last step, for f x, the system infers type 8 with constraint 
af ~ (Az >r L). At the same time, the variable use in f x is also inferred 
as ft, x". Note that the use of x is m because it is passed to f : ag >r B. 

— After that, in the last two steps again, the system yields constraints 7, < m 
and 1 < mẹ. 


As a result, the type T = af >r; Qr >r, 8 is inferred with the constraint 
C = af ~ (Az >r B)A Te < TAL < Tez. 

Then, we try to assign a polytype to app by the rules in Fig. 4. By simplifi- 
cation, we have T simp C ~~ Tz < T; [ap + (ax >r B)]. Thus, by generalizing 
Taz > (Ar >r B)] = (Az >r B) >r; Qr >r, B with Tg < 7, we obtain the 
following type for app: 


app :‘/p pF prab. p < pr = (a >p b) >p; Ap, b 


Correctness We first prepare some definitions for the correctness discussions. 
First, we allow substitutions 0 to replace unification multiplicity variables as well 
as unification type variables. Then, we extend the notion of = and write C = ©” 
if C’6 holds when C0 holds. From now on, we require that substitutions are 
idempotent, i.e., T00 = 70 for any 7, which excludes substitutions [a +> List a] 
and [a++ 8,3 ++ Int] for example. Let us write Q = 0 = 6 if Q H T0 ~ 70’ for 
any T. The restriction of a substitution 0 to a domain X is written by 0|x. 

Consider a pair (Qg, Cw), where we call Q and Cw given and wanted con- 
straints, respectively. Then, a pair (Q,9@) is called a (sound) solution [31] for the 
pair (Qs, Cw) if Qs A Q H Cw, dom(@) N fuv(Qz) = 0, and dom(0) N fuv(Q) = 0. 
A solution is called guess-free [31] if it satisfies Qg A Cw F QA A ,edom(o) (T = 
A(77)) A \aedom(o) (@ ~ O(@)) in addition. Intuitively, a guess-free solution consists 
of necessary conditions required for a wanted constraint Cw to hold, assuming 
a given constraint Q,. For example, for (T,a ~ (8 —, 8)), (T, [a (Int >; 
Int), 8 > Int]) is a solution but not guess-free. Very roughly speaking, being for 
(Q,0) a guess-free solution of (Qg, Cw) means that (Q,6) is equivalent to Cy 
under the assumption Qg. There can be multiple guess-free solutions; for example, 
for (1, < 1), both (m < 1,0) and (T, [mr > 1]) are guess-free solutions. 
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Lemma 10 (Soundness and Principality of Simplification). If Q simp 
C ~ Q';0, (Q',0) is a guess-free solution for (Q, C). 


Lemma 11 (Completeness of Simplification). If (Q',0) is a solution for 
(Q,C) where Q is satisfiable, then Q simp C ~œ Q”; 0’ for some Q” and 6’. 


Theorem 2 (Soundness of Inference). Suppose It» e : 7 ~ A; C and there 
is a solution (Q, 0) for (T,C). Then, we have Q; T0; A0 F e: 78. 


Theorem 3 (Completeness and Principality of Inference). Suppose I’ > 
e: 7 ~ A;C. Suppose also that Q’; r0’; A’ F e : r’ for some substitution 6’ 
on unification variables such that dom(6’) C fuv(I’) and dom(6’) N fuv(Q’) = 4. 
Then, there exists 0 such that dom(@) \ dom(6’) C X, (Q’,@) is a solution for 
(T,C), Q! E laom = 8, Q = T8 ~ 7’, and Q’ — A0 < A’, where X is the 
set of unification variables introduced in the derivation. 


Note that the constraint generation l > e : rT ~ A;C always succeeds, 
whereas the generated constraints may possibly be conflicting. Theorem 3 states 
that such a case cannot happen when e is well-typed under the rules in Fig. 2. 


Incompleteness in Typing Programs. It may sound contradictory to Theorem 3, 
but the type inference is indeed incomplete for checking type-annotated bindings. 
Recall that the typing rule for type-annotated bindings requires that the resulting 
constraint after simplification must be T. However, even when there exists a 
solution of the form (7,0) for (Q, C), there can be no guess-free solution of this 
form. For example, (T,7 < 7’) has a solution (T,[7 +> 7’]), but there are no 
guess-free solutions of the required form. Also, even though there exists a guess- 
free solution of the form (T,@), the simplification may not return the solution, as 
guess-free solutions are not always unique. For example, for (T,7 < mAT’ <T), 
(T, [7 ++ 7’]) is a guess-free solution, whereas we have T simp 7 < T'AT < T ~ 
m <T’ An’ < T; h. The source of the issue is that constraints on multiplicities 
can (also) be solved by substitutions. 

Fortunately, this issue disappears when we consider disambiguation in Sect. 4. 
By disambiguation, we can eliminate constraints for internally-introduced multi- 
plicity unification variables that are invisible from the outside. As a result, after 
processing equality constraints, we essentially need only consider rigid multiplicity 
variables when checking entailment for annotated top-level bindings. 


Promoting Equalities to Substituions. The inference can infer polytypes Vp. p < 
1 = Int >, Int and Vp; po. (pı < po A p2 < pı) > Int >p, Int >p, Int, while 
programmers would prefer more simpler types Int +, Int and Vp. Int +, Int +, 
Int; the simplification so far does not yield substitutions on multiplicity unification 
variables. Adding the following rule remedies the situation: 


r¢éfu(Q) tx pM 
QAQwET< pA ST Q Psimp (Qw A C)[7H u] ~ QA 
Q simp Qw AC ~ Q'300 [tH pl 


S-EQ 


Modular Inference of Linear Types for Multiplicity-Annotated Arrows 469 


This rule says that if 7 = u must hold for Qw AC to hold, the simplification yields 
the substitution [r > u]. The condition m ¢ fuv(Q) is required for Lemma 10; a 
solution cannot substitute variables in Q. Note that this rule essentially finds an 
improving substitution [16]. 

Using the rule is optional. Our prototype implementation actually uses S-EQ 
only for Qw for which we can find p easily: M < 1, w < u, and looping chains 
Ha S H2 Ntt A Un~ S Hn A Hn < H. 


3.2 Entailment Checking by Horn SAT Solving 


The simplification rules rely on the check of entailment Q — @. For the constraints 
in this system, we can perform this check in quadratic time at worst but in linear 
time for most cases. Specifically, we reduce the checking Q = ¢ to satisfiability of 
propositional Horn formulas (Horn SAT), which is known to be solved in linear 
time in the number of occurrences of literals [10], where the reduction (precisely, 
the preprocessing of the reduction) may increase the problem size quadratically. 
The idea of using Horn SAT for constraint solving in linear typing can be found 
in Mogensen [23]. 

First, as a preprocess, we normalize both given and wanted constraints by 
the following rules: 


= Replace Mı - Mo < M with Mı < M A Mo < M. 
— Replace M -1 and 1- M with M, and M -w and w- M with w. 
— Remove trivial predicates 1 < M and M < w. 


After this, each predicate ¢ has the form pu < Į |; vi. 
After the normalization above, we can reduce the entailment checking to 
satisfiability. Specifically, we use the following property: 


QEu< Ne iff QA Noi < 1) A (w < p) is unsatisfiable 


Here, the constraint Q A A;(vi < 1) A (w < u) intuitively asserts that there exists 
a counterexample of Q H u < J|; ri- 

Then, it is straightforward to reduce the satisfiability of Q to Horn SAT; 
we just map 1 to true and w to false and accordingly map < and - to 4 and 
A, respectively. Since Horn SAT can be solved in linear time in the number of 
occurrences of literals [10], the reduction also shows that the satisfiability of Q is 
checked in linear time in the size of Q if Q is normalized. 


Corollary 1. Checking Q | ¢ is in linear time if Q and ¢ are normalized. 


The normalization of constraints can duplicate M of --- < M, and thus 
increases the size quadratically in the worst case. Fortunately, the quadratic 
increase is not common because the size of M is bounded in practice, in many cases 
by one. Among the rules in Fig. 2, only the rule that introduces non-singleton 
M in the right-hand side of < is CASE for a constructor whose arguments’ 
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multiplicities are non-constants, such as MkMany : Vpa.a —>p Many p a. However, 
it often suffices to use non-multiplicity-parameterized constructors, such as 
Cons : Va. a —1 List a —, List a, because such constructors can be used to 
construct or deconstruct both linear and unrestricted data. 


3.3 Issue: Inference of Ambiguous Types 


The inference system so far looks nice; the system is sound and complete, and 
infers principal types. However, there still exists an issue to overcome for the 
system to be useful: it often infers ambiguous types [15,27] in which internal 
multiplicity variables leak out to reveal internal implementation details. 
Consider app’ = Af.Ax.app f x for app = Af.Av.f x from Example 1. We 
would expect that equivalent types are inferred for app’ and app. However, this 
is not the case for the inference system. In fact, the system infers the following 
type for app’ (here we reproduce the inferred type of app for comparison): 


app : Vp pf Px ab. (p < pr) = (a Sy b) >p; a >p, b 
app’ : Vqqf dæ Pf Prab. (q < de Age < pf A Ge < pe) > (a >q b) 4p, a >p, b 
We highlight why this type is inferred as follows. 


— By abstractions, f is assigned to type af and multiplicity mf, and x is 
assigned to type a, and multiplicity Ty. 
— By its use, app is instantiated to type (a’ >, 6’) ZA a >x, B' with 


constraint 1’ < m}. 
— For app f, the system infers type 3 with constraint ((a’ >, 3’) ZA a! >r, 


B') ~ (af >r, 8). At the same time, the variable use in the expression is 
inferred as appt, f™. 
— For (app f x), the system infers type y with constraint 6 ~ (a’ >r, 7). At 
the same time, the variable use in the expression is inferred as app', f™, 72. 
— As a result, Af.Av.app f x has type af >r; Qr >r, Y, yielding constraints 
Ni L Tf NAT L Tg. 


Then, for the gathered constraints, by simplification (including S-EQ), we obtain 
a (guess-free) solution (Q, 0) such that Q = (7; < nf AT’ < Ty Am, < mz) and 
0 = [af = (a! Sa P) Ti = Th, B i (af >n, B) T2 > 7,7 > B')). Then, 
after generalizing (af >r; Qr >r, Y) = (a >r B') 4a, a x, B, we obtain 
the inferred type above. 

There are two problems with this inference result: 


— The type of app’ is ambiguous in the sense that the type-level variables in the 
constraint cannot be determined only by those that appear in the type [15,27]. 
Usually, ambiguous types are undesirable, especially when their instantiation 
affects runtime behavior [15, 27,31]. 

— Due to this ambiguity, the types of app and app’ are not judged equivalent 
by the inference system. For example, the inference rejects the binding 
app" :Vppf px ab. (p < pr) = (a >p b) +p, @ 4p, b = app’ because the 
system does not know how to instantiate the ambiguous type-level variables 
qf and qz, while the binding is valid in the type system in Sect. 2. 
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Inference of ambiguous types is common in the system; it is easily caused by 
using defined variables. Rejecting ambiguous types is not a solution for our case 
because it rejects many programs. Defaulting such ambiguous type-level variables 
to 1 or w is not a solution either because it loses principality in general. However, 
we have no other choices than to reject ambiguous types, as long as multiplicities 
are relevant in runtime behavior. 

In the next section, we will show how we address the ambiguity issue un- 
der the assumption that multiplicities are irrelevant at runtime. Under this 
assumption, it is no problem to have multiplicity-monomorphic primitives such 
as array processing primitives (e.g., readMArray : Ya. MArray a —>ı Int >, 
(MArray a &® Un a)) [31]. Note that this assumption does not rule out all 
multiplicity-polymorphic primitives; it just prohibits the primitives from in- 
specting multiplicities at runtime. 


4 Disambiguation by Quantifier Elimination 


In this section, we address the issue of ambiguous and leaky types by using 
quantifier elimination. The basic idea is simple; we just view the type of app’ as 


app’ : Yq pf Pe ab. (Iqa qf- q < da Adf < DEA Qe < Du) > (a >q b) >p; Ap, D 


In this case, the constraint (Sqr qf. q < de Aap < pf ^ qe < Px) is logically 
equivalent to q < p,, and thus we can infer the equivalent types for both app 
and app’. Fortunately, such quantifier elimination is always possible for our repre- 
sentation of constraints; that is, for 3p.Q, there always exists Q’ that is logically 
equivalent to 4p.Q. A technical subtlety is that, although we perform quantifier 
elimination after generalization in the above explanation, we actually perform 
quantifier elimination just before generalization, or more precisely, as a final step 
of simplification, for compatibility with the simplification in OUTSIDEIN(X) [31], 
especially in the treatment of local assumptions. 


4.1 Elimination of Existential Quantifiers 


The elimination of existential quantifiers is rather easy; we simply use the well- 
known fact that a disjunction of a Horn clause and a definite clause can also be 
represented as a Horn clause. Regarding our encoding of normalized predicates 
(Sect. 3.2) that maps u < M to a Horn clause, the fact can be rephrased as: 


Lemma 12. (ux MVw< M’)=y<M-M’. 


Here, we extend constraints to include V and write = for the logical equivalence; 
that is, Q = Q’ if and only if Q E Q’ and Q'E Q. 
As a corollary, we obtain the following result: 


Corollary 2. There effectively exists a quantifier-free constraint Q’, denoted by 
elim(47.Q), such that Q’ is logically equivalent to 47.Q. 
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Proof. Note that Ir.Q means Q[r > 1] V Q[r > w] because 7 ranges over {1, w}. 
We safely assume that Q is normalized (Sect. 3.2) and that Q does not contain a 
predicate 7 < M where z appears also in M, because such a predicate trivially 
holds. 

We define 81, Bu, and Qrest as Ba = {u < M |(u<r-M)EQ, LET}, Bu = 
{u < M | (x < M) € Q, 1 € fuv(M)}, and Qrest = A {0 | 6 € Q, 7 ¢ fuv(9)}. Here, 
we abused the notation to write ¢ € Q to mean that Q = A; ¢; and ¢ = ¢; 
for some i. In the construction of 1, we assumed the monoid laws of (-); 
the definition says that we remove a from the right-hand sides and M be- 
comes 1 if the right-hand side is 7. By construction, Q[p +> 1] and Q[p > w 
are equivalent to (A 81) A Qrest and (A Bu) A Qrest; respectively. Thus, by 
Lemma 12 and by the distributivity of V over A^ it suffices to define Q’ as 
Q’=(A{u<M-M'| p< M E, w < M' €O}) A Qrest- 


Example 2. Consider Q = (n, < mp At’ < Ty^ m7, < Tx); this is the constraint 
obtained from Af.Azr.app f x (Sect. 3.3). Since 7 and 7, do not appear in the 
inferred type (a! >, B’) >r; a’ >r, B, we want to eliminate them by the 
above step. There is a freedom to choose which variable is eliminated first. Here, 
we shall choose 7; first. 

First, we have elim(Ar;.Q) = n < T, Am, < Tx because for this case 
we have ; = 0, Bu = {w < Tf}, and Qrest = T < T! AT, < mz. We then 
have elim(3r/ m" < mi, A al, < mt) = T < Ty because for this case we have 
pı = {r < 1}, 2 = {w < Te}, and Qrest = T. 


In the worst case, the size of elim(47.Q) can be quadratic to that of Q. Thus, 
repeating elimination can make the constraints exponentially bigger. We believe 
that such blow-up rarely happens because it is usual that m occurs only in a few 
predicates in Q. Also, recall that non-singleton right-hand sides are caused only 
by multiplicity-parameterized constructors. When each right-hand side of < is a 
singleton in Q, the same holds in elim(47.Q). For such a case, the exponential 
blow-up cannot happen because the size of constraints in the form is at most 
quadratic in the number of multiplicity variables. 


4.2 Modified Typing Rules 


As mentioned at the begging of this section, we perform quantifier elimination as 
the last step of simplification. To do so, we define Q +3, C ~ Q”;6 as follows: 


simp 


Q simp C œ Q';0 {7T} = fuv(Q’) \ fuv(T0) Q” = elim(a7.Q’) 
Q imp C ~ o 0 


Here, 7 is used to determine which unification variables will be ambiguous after 
generalization. We simply identify variables (7 above) that are not in 7 as 
ambiguous |15] for simplicity. This check is indeed conservative in a more general 
definition of ambiguity [27], in which Vpra. (p < r,r < p) > a —>p a for example 
is not judged as ambiguous because r is determined by p. 
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Then, we replace the original simplification with the above-defined version. 
ree:r~ AC T Ei O ~ QA {7a} = fuv(Q, 70) 
p,a: fresh I, f : Vpa.(Q => 70)[a- a,r > p| > prog 
I Hœ f =e; prog 


TRe:av AC QPZ CAT wo ~ T; If :Vpa.Q=> 7! prog 


simp 


T P f: (Vpa.Q => T) = e; prog 


Here, the changed parts are highlighted for readability. 


Example 3. Consider (Q, 0) in Sect. 3.3 such that Q = (1p < nf AT’ < TAT, S 
Tz) and 0 = [az ++ (a! >v B'),m = Th, B > (af >n, B), T2 > TY => 
B']), which is obtained after simplification of the gathered constraint. Following 
Example 2, eliminating variables that are not in T0 = (a! >, B’) >r; a! >r, B 
yields the constraint 7’ < my. As a result, by generalization, we obtain the 
polytype 


Va pf Pa ab. (q < Dr) > (a >q b) >p; a >p, b 


for app', which is equivalent to the inferred type of app. 


Note that (Q',0) of Q Pimp C ~ Q';0 is no longer a solution of (Q, C') 
because C can have eliminated variables. However, it is safe to use this version 
when generalization takes place, because, for variables q that do not occur in 7, 
Vpqa. Q = T and Ypa. Q' = 7 have the same set of monomorphic instances, if 
4q.Q is logically equivalent to Q’. Note that in this type system simplification 


happens only before (implicit) generalization takes place. 


5 Extension to Local Assumptions 


In this section, following OUTSIDEIN(X) [31], we extend our system with local 
assumptions, which enable us to have lets and GADTs. We focus on the treatment 
of lets in this section because type inference for lets involves a linearity-specific 
concern: the multiplicity of a let-bound variable. 


5.1 “Let Should Not Be Generalized” for Our Case 


We first discuss that even for our case “let should not be generalized” [31]. That 
is, generalization of let sometimes results in counter-intuitive typing and conflicts 
with the discussions so far. 

Consider the following program: 


h = Af.àk.let y = f (Av.k x) in 0 


Suppose for simplicity that f and x have types (a >r, b) >r, cand a >r, b, 
respectively (here we only focus on the treatment of multiplicity). Then, f (Ax.k x) 
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has type c with the constraint 73 < mı. Thus, after generalization, y has type 
T3 < Tı = c, where 73 and 7, are neither generalized nor eliminated because 
they escape from the definition of y. As a result, h has type Vpi p2 ps abc. ((a 4p, 
b) +p. c) >w (a +p, bD) >w Int; there is no constraint ps < pı because the 
definition of y does not yield a constraint. This nonexistence of the constraint 
would be counter-intuitive because users wrote f (Av.k x) while the constraint 
for the expression is not imposed. In particular, it does not cause an error even 
when f : (a >, b) >; cand k : a >w b, while f (Ax.k x) becomes illegal for this 
case. Also, if we change 0 to y, the error happens at the use site instead of the 
definition site. Moreover, the type is fragile as it depends on whether y occurs or 
not; for example, if we change 0 to const 0 y where const = Aa.Ab.a, the type of 
h changes to Vp; p2 p3 abe. pı < ps > ((a +p, b) 4p, €) >w (a >p; bD) >u Int. 
In this discussion, we do not consider type-equality constraints, but there are no 
legitimate reasons why type-equality constraints are solved on the fly in typing y. 

As demonstrated in the above example, “let should not be generalized” [30,31] 
in our case. Thus, we adopt the same principle in OUTSIDEIN(X) that let will 
be generalized only if users write a type annotation for it [31]. This principle is 
also adopted in GHC (as of 6.12.1 when the language option MonoLocalBinds is 
turned on) with a slight relaxation to generalize closed bindings. 


5.2 Multiplicity of Let-Bound Variables 


Another issue with let-generalization, which is specific to linear typing, is that a 
generalization result depends on the multiplicity of the let-bound variable. Let 
us consider the following program, where we want to generalize the type of y 
(even without a type annotation): 


g = Az.let y = Af.f x in y not 


Suppose for simplicity that not has type Bool +, Bool and z has type Bool already 
in typing let. Then, y’s body Af.f x has a monotype (Bool >, r) >w r with 
no constraints (on multiplicity). There are two generalization results depending 
on the multiplicity 7, of y because the use of x also escapes in the type system. 


— If m} = 1, the type is generalized into Yq r. (Bool >; r) +4 r, where 7 is not 
generalized because the use of x in y’s body is 7. 

— If m} = w, the type is generalized into Vpqr. (Bool >, r) >q r, where 7 is 
generalized (to p) because the use of x in y’s body is w. 


A difficulty here is that 7, needs to be determined at the definition of y, while 
the constraint on 7, is only obtained from the use of y. 

Our design choice is the latter; the multiplicity of a generalizable let-bound 
variable is w in the system. One justification for this choice is that a motivation 
of polymorphic typing is to enhance reusability, while reuse is not possible for 
variables with multiplicity 1. Another justification is compatibility with recursive 
definitions, where recursively-defined variables must have multiplicity w; it might 
be confusing, for example, if the multiplicity of a list-manipulation function 
changes after we change its definition from an explicit recursion to foldr. 
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5.3 Inference Rule for Lets 
In summary, the following are our criteria about let generalization: 


— Only lets with polymorphic type annotations are generalized. 
— Variables introduced by let to be generalized have multiplicity w. 


This idea can be represented by the following typing rule: 


Deep: ~~ A; {Ta} = fuv(71, C1) \fuv(T) 
Ci = Ira. (Q H CAT ~T) 
TOt: (Vpa.Q > T) Æ e : To ~> Ao, £™; C2 


I H let z : (Vpa.Q > T) = e1 in eg : T2 œ WA, + 42; CL A C2 


LETA 


(We do not discuss non-generalizable let because they are typed as (Ax.e2) e1.) 
Constraints like 37a. (Q =™ C1 AT ~ 7) above are called implication con- 
straints |31], which states that the entailment must hold only by instantiating 
unification variables in 7@. There are two roles of implication constraints. One 
is to delay the checking because 7; and C4 contain some unification variables 
that will be made concrete after this point by solving C2. The other is to guard 
constraints; in the above example, since the constraints C1 AT ~ 7 hold by 
assuming Q, it is not safe to substitute variables outside 7@ in solving the con- 
straints because the equivalence might be a consequence of Q; recall that Q 
affects type equality. We note that there is a slight deviation from the original 
approach [31]; an implication constraint in our system is annotated by 7 to 
identify for which subset of {ra} the existence of a unique solution is not required 
and thus quantifier elimination is possible, similarly to Sect. 4. 


5.4 Solving Constraints 


Now, the set of constraints is extended to include implication constraints. 


C= Nyi pi =: | Ima. (Q H C) 


As we mentioned above, an implication constraint 37a.(Q 7 C) means that 
Q EC must hold by substituting 7 and @ with appropriate values, where we do 
not require uniqueness of solutions for unification variables that do not appear 
in T. That is, Q limp C ~ T;@ must hold with dom(@) C {7a}. 

Then, following OUTSIDEIN(X) [31], we define the solving judgment TA.Q Pow 
C ~ Q';6, which states that we solve (Q,C) as (Q’,@) where 6 only touches 
variables in 7@, where 7 is used for disambiguation (Sect. 4). Let us write impl(C) 
for all the implication constraints in C, and simpl(C) for the rest. Then, we can 
define the inference rules for the judgment simply by recursive simplification, 
similarly to the original [31]. 


TA. Q F sirpi simpl (C') Pe Qs; 0 
{Tiai QA Qi A Qr HE, Ci ~ T; Oi} ara. (QH C) eimpl(CO) 
Ta. Q Phy C ~> Q0 
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Here, 7a. Q Simpl C ~ Qr;0 is a simplification relation defined similarly to 
Q simp C ~ Qr; 0 except that we are allowed to touch only variables in ma. We 
omit the concrete rules for this version of simplification relation because they are 
straightforward except that unification caused by S-UNI and S-EQ and quantifier 
elimination (Sect. 4) are allowed only for variables in {7a}. 

Accordingly, we also change the typing rules for bindings to use the solving 
relation instead of the simplification relation. 


Piee:tT~ AC fuv(C,7). T Piw C~Q;0 {7a} = fuv(Q,70) 
p,a@:fresh I, f :Vpa.(Q => 70)[a>a,7 > p| & prog 
I’ f =e; prog 
ree:o~x A&C fuv(C,c).Q hon CAT ~o~ T;0 Tf: Vpa.Q > 7b prog 
I H f : (Vpa.Q => T) = e; prog 


Above, there are no unification variables other than fuv(C,7) or fuv(C, ø). 

The definition of the solving judgment and the updated inference rules for 
programs are the same as those in the original OUTSIDEIN(X) [31] except 7 for 
disambiguation. This is one of the advantages of being based on OUTSIDEIN(X). 


6 Implementation and Evaluation 


In this section, we evaluate the proposed inference method using our prototype 
implementation. We first report what types are inferred for functions from 
Prelude to see whether or not inferred types are reasonably simple. We then 
report the performance evaluation that measures efficiency of type inference and 
the overhead due to entailment checking and quantifier elimination. 


6.1 Implementation 


The implementation follows the present paper except for a few points. Following 
the implementation of OUTSIDEIN(X) in GHC, our type checker keeps a natural 
number, which we call an implication level, corresponding to the depth of implica- 
tion constraints, and a unification variable also accordingly keeps the implication 
level at which the variable is introduced. As usual, we represent unification 
variables by mutable references. We perform unification on the fly by destructive 
assignment, while unification of variables that have smaller implication levels than 
the current level is recorded for later checking of implication constraints; such a 
variable cannot be in 7@ of 47a@.Q 7 C. The implementation supports GADTs 
because they can be implemented rather easily by extending constraints Q to 
include type equalities, but does not support type classes because the handling 
of them requires another X of OUTSIDEIN(X). 

Although we can use a linear-time Horn SAT solving algorithm [10] for 
checking Q |} ¢, the implementation uses a general SAT solver based on DPLL [8, 
9] because the unit propagation in DPLL works efficiently for Horn formulas. 
We do not use external solvers, such as Z3, as we conjecture that the sizes of 
formulas are usually small, and overhead to use external solvers would be high. 
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(o):(q<sAq<tAp<t) => (bq c) >r (a >p b) >s a > € 
curry: (p <r Ap K< s) => ((a8b)—>p c) 9g a>r b >s c 
uncurry: (p < s ^q < s) => (a >p bq c) >r (a 8b) >s € 


( 
either: (p < r ^q <r) => (apc) >w (bq c) >w Either a b > c 
( 
( 
( 


foldr:(q<rAp<sAq<s)=> (a >p bq b) >w b >r List a >s b 
foldl:(p<rAr<sAq<s)=> (bp aq b) >u b >r List a >, b 
map: (p < q) > (a >p b) Sw List a —>q List b 
filter : (a +» Bool) >. List a +. List a 

append : List a —>p List a +, List a 

reverse : List a +p List a 

concat : List (List a) +p List a 

concatMap : (p < q) = (a >p List b) >u List a +, List b 


Fig. 7. Inferred types for selected functions from Prelude (quantifications are omitted) 


6.2 Functions from Prelude 


We show how our type inference system works for some polymorphic functions 
from Haskell’s Prelude. Since we have not implemented type classes and I/O 
in our prototype implementation and since we can define copying or discarding 
functions for concrete first-order datatypes, we focus on the unqualified poly- 
morphic functions. Also, we do not consider the functions that are obviously 
unrestricted, such as head and scanl, in this examination. In the implementation 
of the examined functions, we use natural definitions as possible. For example, a 
linear-time accumulative definition is used for reverse. Some functions can be 
defined by both explicit recursions and foldr/foldl; among the examined functions, 
map, filter, concat, and concatMap can be defined by foldr, and reverse can be 
defined by foldl. For such cases, both versions are tested. 

Fig. 7 shows the inferred types for the examined functions. Since the inferred 
types coincide for the two variations (by explicit recursions or by folds) of map, 
filter, append, reverse, concat, and concatMap, the results do not refer to these 
variations. Most of the inferred types look unsurprising, considering the fact that 
the constraint p < q is yielded usually when an input that corresponds to q is 
used in an argument that corresponds to p. For example, consider foldr f e as. 
The constraint q < r comes from the fact that e (corresponding to r) is passed as 
the second argument of f (corresponding to q) via a recursive call. The constraint 
p < s comes from the fact that the head of xs (corresponding to s) is used as the 
first argument of f (corresponding to p). The constraint g < s comes from the 
fact that the tail of zs is used in the second argument of f. A little explanation 
is needed for the constraint r < s in the type of foldl, where both r and s are 
associated with types with the same polarity. Such constraints usually come from 
recursive definitions. Consider the definition of foldl: 


foldl = Af.Ae.Ax.case x of {Nil > e; Cons a y > foldl f (f ea) y} 


Here, we find that a, a component of x (corresponding to s), appears in the 
second argument of fold (corresponding to r), which yields the constraint r < s. 
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Note that the inference results do not contain — ; recall that there is no problem 
in using unrestricted inputs linearly, and thus the multiplicity of a linear input 
can be arbitrary. The results also show that the inference algorithm successfully 
detected that append, reverse, and concat are linear functions. 

It is true that these inferred types indeed leak some internal details into their 
constraints, but those constraints can be understood only from their extensional 
behaviors, at least for the examined functions. Thus, we believe that the inferred 
types are reasonably simple. 


6.3 Performance Evaluation 


We measured the elapsed time 


Table 1. Experimental results 


for type checking and the over- | Total SAT QE 

head of implication checking Program|LOC|Elapsed|Elapsed (#)|Elapsed (#) 
and quantifier elimination. The funes 40 4.3 0.70 (42)| 0.086 (15) 
following programs were Sa 87 53 ae 0-091 D QILT) 
N ' app1 4| 0.34) 0.047 ( 4)| 0.012 ( 2) 
ined in the experiments: funcs: app10 4 0.84| 0.049 ( 4)! 0.038 (21) 


the functions in Fig. 7, gv: 
an implementation of a simple 
communication in a session-type system GV |17] taken from [18, Section 4] with 
some modifications, app1: a pair of the definitions of app and app’, and app10: 
a pair of the definitions of app and app10 = Af.Ax. app ... app f x. The former 


(times are measured in ms) 


two programs are intended to be miniatures of bsinal poris The latter 
two programs are intended to measure the overhead of quantifier elimination. 
Although the examined programs are very small, they all involve the ambiguity 
issues. For example, consider the following fragment of the program gv: 


answer : Int = fork prf calculator $ \c -> left c & \c -> 
send (MkUn 3) c & \c -> send (MkUn 4) c & \c -> 


recv c & \(MkUn z, c) -> wait c & \Q -> MkUn z 


(Here, we used our paper’s syntax instead of that of the actual examined code.) 
Here, both $ and & are operator versions of app, where the arguments are flipped 
in &. As well as treatment of multiplicities, the disambiguation is crucial for this 
expression to have type Int. 

The experiments were conducted on a MacBook Pro (13-inch, 2017) with 
Mac OS 10.14.6, 3.5 GHz Intel Core i7 CPU, and 16 GB memory. GHC 8.6.5 
with -02 was used for compiling our prototype system. 

Table 1 lists the experimental results. Each elapsed time is the average of 1,000 
executions for the first two programs, and 10,000 executions for the last two. All 
columns are self-explanatory except for the # column, which counts the number of 


6 We changed the type of fork : Dual s s’ >w (Ch s 41 Ch End) — (Ch s’ >) 
Un r) +1 r, as their type Dual s s’ > (Ch s +1 Ch End) —; Ch 8’ is incorrect for 
the multiplicity erasing semantics. A minor difference is that we used a GADT to 
witness duality because our prototype implementation does not support type classes. 
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executions of corresponding procedures. We note that the current implementation 
restricts Qw in S-ENTAIL to be T and removes redundant constraints afterward. 
This is why the number of SAT solving in app1 is four instead of two. For the 
artificial programs (app1 and app10), the overhead is not significant; typing 
cost grows faster than SAT/QE costs. In contrast, the results for the latter two 
show that SAT becomes heavy for higher-order programs (funcs), and quantifier 
elimination becomes heavy for combinator-heavy programs (gv), although we 
believe that the overhead would still be acceptable. We believe that, since we 
are currently using naive algorithms for both procedures, there is much room 
to reduce the overhead. For example, if users annotate most general types, the 
simplification invokes trivial checks /\; 6; H ¢; often. Special treatment for such 
cases would reduce the overhead. 


7 Related Work 


Borrowing the terminology from Bernardy et al. [7], there are two approaches to 
linear typing: linearity via arrows and linearity via kinds. The former approaches 
manage how many times an assumption (i.e., a variable) can be used; for example, 
in Wadler [33]’s linear A calculus, there are two sort of variables: linear and 
unrestricted, where the latter variables can only be obtained by decomposing 
let !2 = e1 in eg. Since primitive sources of assumptions are arrow types, it is nat- 
ural to annotate them with arguments’ multiplicities [7, 12,22]. For multiplicities, 
we focused on 1 and w following Linear Haskell [6,7,26]. Although {1,w} would 
already be useful for some domains including reversible computation |19, 35] 
and quantum computation [2,25], handling more general multiplicities, such 
as {0,1,w} and arbitrary semirings [12], is an interesting future direction. Our 
discussions in Sect. 2 and 3, similarly to Linear Haskell |7], could be extended 
to more general domains with small modifications. In contrast, we rely on the 
particular domains {1,w} of multiplicities for the crucial points of our inference, 
i.e., entailment checking and quantifier elimination. Igarashi and Kobayashi [14]’s 
linearity analysis for 7 calculus, which assigns input/output usage (multiplicities) 
to channels, has similarity to linearity via arrows. Multiplicity 0 is important in 
their analysis to identify input/output only channels. They solve constraints on 
multiplicities separately in polynomial time, leveraging monotonicity of multi- 
plicity operators with respect to ordering 0 < 1 < w. Here, 0 < 1 comes from the 
fact that 1 in their system means “at-most once” instead of “exactly once”. 

The “linearity via kinds” approaches distinguish types of which values are 
treated linearly and types of which values are not [21,24,28], where the distinction 
usually is represented by kinds [21,28]. Interestingly, they also have two function 
types—function types that belong to the linear kind and those that belong to 
the unrestricted kind—because the kind of a function type cannot be determined 
solely by the argument and return types. Mazurak et al. [21] use subkinding to 
avoid explicit conversions from unrestricted values to linear ones. However, due 
to the variations of the function types, a function can have multiple incompatible 
types; e.g., the function const can have four incompatible types [24] in the system. 
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Universal types accompanied by kind abstraction [28] address the issue to some 
extent; it works well for const, but still gives two incomparable types to the 
function composition (0) [24]. Morris [24] addresses this issue of principality 
with qualified typing [15]. Two forms of predicates are considered in the system: 
Un 7 states that 7 belongs to the unrestricted kind, and ø < 7 states that 
Un o implies Un 7. This system is considerably simple compared with the 
previous systems. Turner et al. [29]’s type-based usage analysis has a similarity 
to the linearity via kinds; in the system, each type is annotated by usage (a 
multiplicity) as (List Int’)”. Wansbrough and Peyton Jones [34] extends the 
system to include polymorphic types and subtyping with respect to multiplicities, 
and have discussions on multiplicity polymorphism. Mogensen [23] is a similar 
line of work, which reduces constraint solving on multiplicities to Horn SAT. 
His system concerns multiplicities {0,1,w} with ordering 0 < 1 < w, and his 
constraints can involve more operations including additions and multiplications 
but only in the left-hand side of <. 

Morris [24] uses improving substitutions [16] in generalization, which some- 
times are effective for removing ambiguity, though without showing concrete 
algorithms to find them. In our system, as well as S-EQ, elim(S7.Q) can be 
viewed as a systematic way to find improving substitutions. That is, elim(47.Q) 
improves Q by substituting m with min{ M; | w < M; E€ Pu}, i.e., the largest 
possible candidate of m. Though the largest solution is usually undesirable, espe- 
cially when the right-hand sides of < are all singletons, we can also view that 
elim(47.Q) substitutes 7 by skies, li, i.e., the smallest possible candidate. 


8 Conclusion 


We designed a type inference system for a rank 1 fragment of A%, |7] that can infer 
principal types based on the qualified typing system OUTSIDEIN(X) [31]. We 
observed that naive qualified typing infers ambiguous types often and addressed 
the issue based on quantifier elimination. The experiments suggested that the 
proposed inference system infers principal types effectively, and the overhead 
compared with unrestricted typing is acceptable, though not negligible. 

Since we based our work on the inference algorithm used in GHC, the natural 
expectation is to implement the system into GHC. A technical challenge to achieve 
this is combining the disambiguation techniques with other sorts of constraints, 
especially type classes, and arbitrarily ranked polymorphism. 
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Abstract. Reduction to the satisfiablility problem for constrained Horn 
clauses (CHCs) is a widely studied approach to automated program veri- 
fication. The current CHC-based methods for pointer-manipulating pro- 
grams, however, are not very scalable. This paper proposes a novel trans- 
lation of pointer-manipulating Rust programs into CHCs, which clears 
away pointers and heaps by leveraging ownership. We formalize the trans- 
lation for a simplified core of Rust and prove its correctness. We have 
implemented a prototype verifier for a subset of Rust and confirmed the 
effectiveness of our method. 


1 Introduction 


Reduction to constrained Horn clauses (CHCs) is a widely studied approach to 
automated program verification [22,6]. A CHC is a Horn clause [30] equipped 
with constraints, namely a formula of the form y <= Yo A+- A Yk—1, where p 
and Wo,.--.,Wx—1 are either an atomic formula of the form f(to,...,tn—1) (f is 
a predicate variable and to,...,tn—1 are terms), or a constraint (e.g. a < b+ 1). 
We call a finite set of CHCs a CHC system or sometimes just CHC. CHC solving 
is an act of deciding whether a given CHC system S has a model, i.e. a valuation 
for predicate variables that makes all the CHCs in S valid. A variety of program 
verification problems can be naturally reduced to CHC solving. 

For example, let us consider the following C code that defines McCarthy’s 
91 function. 


int mc91(int n) { 
if (n > 100) return n - 10; else return mc91(mc91(n + 11)); 
} 


Suppose that we wish to prove mc91(n) returns 91 whenever n < 101 (if it ter- 
minates). The wished property is equivalent to the satisfiability of the following 
CHCs, where Mc91(n,r) means that mc91(n) returns r if it terminates. 


Mc91(n,r) — n> 100A r=n-—10 


* The full version of this paper is available as [47]. 
' Free variables are universally quantified. Terms and variables are governed under 
sorts (e.g. int, bool), which are made explicit in the formalization of § 3. 
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Mc91(n,r) <= n < 100 A Mc91(n+4 11, res’) A Mc91(res’,r) 
r=91 4— n < 101 A Mc91(n,r) 
The property can be verified because this CHC system has a model: 
Mc91(n,r) :<—= r =91 V (n> 100 A r=n-— 10). 

A CHC solver provides a common infrastructure for a variety of programming 
languages and properties to be verified. There have been effective CHC solvers 
[40,18,29,12] that can solve instances obtained from actual programs? and many 
program verification tools [23,37,25,28,38,60] use a CHC solver as a backend. 

However, the current CHC-based methods do not scale very well for programs 
using pointers, as we see in § 1.1. We propose a novel method to tackle this 
problem for pointer-manipulating programs under Rust-style ownership, as we 
explain in § 1.2. 


1.1 Challenges in Verifying Pointer-Manipulating Programs 


The standard CHC-based approach [23] for pointer-manipulating programs rep- 
resents the memory state as an array, which is passed around as an argument 
of each predicate (cf. the store-passing style), and a pointer as an index. 

For example, a pointer-manipulating variation of the previous program 


void mc9ip(int n, int* r) { 

if (n > 100) *r =n - 10; 

else { int s; mc91p(n + 11, &s); mc91p(s, r); } 
} 


is translated into the following CHCs by the array-based approach:° 
Mc91p(n,r,h,h') <= n> 100 A h’ = h{r 4+ n -— 10} 
Mc91p(n,7r,h,h') =n < 100 A Mc91p(n+ 11, s,h,h”) 
A Mc91p(h" |s], r, k”, h’) 
h'[r]=91 <= n < 101 A Mc91p(n,r, h,h’). 
Mc91p additionally takes two arrays h, h” representing the (heap) memory states 
before/after the call of mc91p. The second argument r of Mc91p, which corre- 
sponds to the pointer argument r in the original program, is an index for the 
arrays. Hence, the assignment *r = n - 10 is modeled in the first CHC as an 
update of the r-th element of the array. This CHC system has a model 
Mc91p(n,7, hh’) <= hk'[r]=91 V (n > 100 A k'fr] =n — 10), 
which can be found by some array-supporting CHC solvers including Spacer [40], 
thanks to evolving SMT-solving techniques for arrays [62,10]. 
However, the array-based approach has some shortcomings. Let us consider, 
for example, the following innocent-looking code.* 


? For example, the above CHC system on Mc91 can be solved instantly by many CHC 
solvers including Spacer [40] and Holce [12]. 

3 h{r < v} is the array made from h by replacing the value at index r with v. h[r] is 
the value of array h at index r. 


4 rand() is a non-deterministic function that can return any integer value. 
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bool just_rec(int* ma) { 
if (rand() >= 0) return true; 
int old_a = *ma; int b = rand(); just_rec(kb); 
return (old_a == *ma); 


J 


It can immediately return true; or it recursively calls itself and checks if the 
target of ma remains unchanged through the recursive call. In effect this function 
does nothing on the allocated memory blocks, although it can possibly modify 
some of the unused parts of the memory. 

Suppose we wish to verify that just_rec never returns false. The standard 
CHC-based verifier for C, SeaHorn [23], generates a CHC system like below:°° 


JustRec(ma, h,h’,r) h' =h A r= true 
JustRec(ma,h,h',r) <= mb # ma ^ h” = h{mb + b} 

A JustRec(mb, h”, h', r^) A r = (h|ma] == k'[ma]) 
r= true <= JustRec(ma,h,h',r) 


Unfortunately the CHC system above is not satisfiable and thus SeaHorn issues 
a false alarm. This is because, in this formulation, mb may not necessarily be 
completely fresh; it is assumed to be different from the argument ma of the 
current call, but may coincide with ma of some deep ancestor calls.” 

The simplest remedy would be to explicitly specify the way of memory allo- 
cation. For example, one can represent the memory state as a pair of an array h 
and an index sp indicating the maximum index that has been allocated so far. 


JustRec4 (ma, h, sp,h', sp',r) 4= h' =h A sp' = sp A r = true 

JustRec. (ma, h, sp, h’, sp’,r) = mb = sp” = sp +1 ^ h” = h{mb < b} 
JustRec4 (mb, h”, sp”, h’, sp',r') A r = (h[ma] == h'[ma]) 

r= true <= JustRec, (ma, h, sp,h',sp',r) A ma < sp 


The resulting CHC system now has a model, but it involves quantifiers: 
JustRec (ma, h, sp, h’, sp',r) <=> r= true A Vi < sp. hfi] = h’ [i 


Finding quantified invariants is known to be difficult in general despite ac- 
tive studies on it [41,2,36,26,19] and most current array-supporting CHC solvers 
give up finding quantified invariants. In general, much more complex operations 
on pointers can naturally take place, which makes the universally quantified in- 
variants highly involved and hard to automatically find. To avoid complexity of 
models, CHC-based verification tools [23,24,37] tackle pointers by pointer anal- 
ysis [61,43]. Although it does have some effects, the current applicable scope of 
pointer analysis is quite limited. 


5 


==,!=, >=, && denote binary operations that return boolean values. 
6 We omitted the allocation for old_a for simplicity. 
T Precisely speaking, SeaHorn tends to even omit shallow address-freshness checks like 
mb 4 ma. 
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1.2 Our Approach: Leverage Rust’s Ownership System 


This paper proposes a novel approach to CHC-based verification of pointer- 
manipulating programs, which makes use of ownership information to avoid an 
explicit representation of the memory. 


Rust-style Ownership. Various styles of ownership/permission/capability have 
been introduced to control and reason about usage of pointers on programming 
language design, program analysis and verification [13,31,8,31,9,7,64,63]. In what 
follows, we focus on the ownership in the style of the Rust programming language 
[46,55]. 

Roughly speaking, the ownership system guarantees that, for each memory 
cell and at each point of program execution, either (i) only one alias has the 
update (write &read) permission to the cell, with any other alias having no 
permission to it, or (ii) some (or no) aliases have the read permission to the cell, 
with no alias having the update permission to it. In summary, when an alias 
can read some data (with an update/read permission), any other alias cannot 
modify the data. 

As a running example, let us consider the program below, which follows 
Rust’s ownership discipline (it is written in the C style; the Rust version is 
presented at Example 1): 


int* take_max(int* ma, int* mb) { 
if (*ma >= *mb) return ma; else return mb; 


F 
bool inc_max(int a, int b) { 
{ 
int* mc = take_max(&a, &b); // borrow a and b 
amc += 1; 
} // end of borrow 
return (a != b); 
} 


Figure 1 illustrates which alias has the update permission to the contents of a 
and b during the execution of take_max (5,3). 

A notable feature is borrow. In the running example, when the pointers &a 
and &b are taken for take_max, the update permissions of a and b are temporarily 
transferred to the pointers. The original variables, a and b, lose the ability to 
access their contents until the end of borrow. The function take_max returns a 
pointer having the update permission until the end of borrow, which justifies the 
update operation *mc += 1. In this example, the end of borrow is at the end of 
the inner block of inc_max. At this point, the permissions are given back to the 
original variables a and b, allowing to compute a != b. Note that mc can point 
to a and also to b and that this choice is determined dynamically. The values of 
a and b after the borrow depend on the behavior of the pointer mc. 

The end of each borrow is statically managed by a lifetime. See § 2 for a more 
precise explanation of ownership, borrow and lifetimes. 
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mc ; —_—e 
ma p e 
a Pere Pere rere rer rrre rrr rr rrr rrrrr rrr rr) 


(iv) 


call return end of 
take_max take_max borrowing 


Fig. 1. Values and aliases of a and b in evaluating inc_max(5,3). Each line shows 
each variable’s permission timeline: a solid line expresses the update permission and a 
bullet shows a point when the borrowed permission is given back. For example, b has 
the update permission to its content during (i) and (iv), but not during (ii) and (iii) 
because the pointer mb, created at the call of take_max, borrows b until the end of (iii). 


Key Idea. The key idea of our method is to represent a pointer ma as a pair (a, ao) 
of the current target value a and the target value ao at the end of borrow.®° This 
representation employs access to the future information (it is related to prophecy 
variables; see § 5). This simple idea turns out to be very powerful. 
In our approach, the verification problem “Does inc_max always return true?” 

is reduced to the satisfiability of the following CHCs: 

TakeMazx((a, do), (b,b0),7) = a >b Ab =bA r= (a, a0) 

TakeMax((a, ao), (b, bo), r) = a <b ^ao =a A r= (b, bo) 

IncMaz(a,b,r) <= TakeMaz((a, ao), (b, bo), (c,Co)) A dc =c++1 

At = A r= (ao != bo) 
r= true <= IncMaz(a,b,r). 


The mutable reference ma is now represented as (a, ao), and similarly for mb and 
mc. The first CHC models the then-clause of take_max: the return value is ma, 
which is expressed as r = (a, ao}; in contrast, mb is released, which constrains 
bo, the value of b at the end of borrow, to the current value b. In the clause on 
IncMaz, mc is represented as a pair (c, co). The constraint d =c+lAa=dc¢ 
models the increment of mc (in the phase (iii) in Fig. 1). Importantly, the final 
check a != b is simply expressed as ao != bo; the updated values of a/b are 
available as ao/bo. Clearly, the CHC system above has a simple model. 
Also, the just_rec example in § 1.1 can be encoded as a CHC system 


JustRec((a, ao), r) do =a A r = true 
JustRec((a,do),r) <= mb = (b,bo) A JustRec(mb,r’) 
Ad =ahr=(a ao) 


8 Precisely, this is the representation of a pointer with a borrowed update permission 
(ie. mutable reference). Other cases are discussed in § 3. 

? For example, in the case of Fig. 1, when take_max is called, the pointer ma is (5, 6) 
and mb is (3, 3). 
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r=true <= JustRec((a,do),r). 


Now it has a simple model: JustRec((a,a.),r) :<=> r = true A a = a. Re- 
markably, arrays and quantified formulas are not required to express the model, 
which allows the CHC system to be easily solved by many CHC solvers. More 
advanced examples are presented in § 3.4, including one with destructive update 
on a singly-linked list. 


Contributions. Based on the above idea, we formalize the translation from pro- 
grams to CHC systems for a core language of Rust, prove correctness (both 
soundness and completeness) of the translation, and confirm the effectiveness 
of our approach through preliminary experiments. The core language supports, 
among others, recursive types. Remarkably, our approach enables us to automat- 
ically verify some properties of a program with destructive updates on recursive 
data types such as lists and trees. 

The rest of the paper is structured as follows. In § 2, we provide a formalized 
core language of Rust supporting recursions, lifetime-based ownership and recur- 
sive types. In §3, we formalize our translation from programs to CHCs and prove 
its correctness. In § 4, we report on the implementation and the experimental 
results. In § 5 we discuss related work and in § 6 we conclude the paper. 


2 Core Language: Calculus of Ownership and Reference 


We formalize a core of Rust as Calculus of Ownership and Reference (COR), 
whose design has been affected by the safe layer of ARust in the RustBelt paper 
[32]. It is a typed procedural language with a Rust-like ownership system. 


2.1 Syntax 
The following is the syntax of COR. 


(program) M ::= Fo --- Fri 
(function definition) F ::= fn f X {Lo: So +++ Ln—1: Sn—1} 
(function signature) X ::= (ao,...,Q@m—1 | Qao < Qbos- -< , Qarı <O,_ 1) 
(xo: To, see n—1: Tn—1) >U 
(statement) S ::= I; goto L | return z 
| match xx {injọ*yo > goto Lo, inj, *y1 > goto Lı} 
(instruction) J ::= lety = mutbora x | dropx | immutz | swap(*x, xy) 
| letxy =a | lety = xx | letxy = copy xz | xas T 
| lety = f(ao,...,Q@m-—1)(%0,---,2n—1) 
| introa | nwa | a< 8 
| 
| 


let xy = const | letxy = *x op xz" | let xy = rand() 
let xy = inj? xa | let xy = (#0, #21) | let (xyo, *y1) = ¥x 


(type) T,U := X | uX.T | PT | To+T | ToxT | int | unit 
(pointer kind) P ::= own | Ra (reference kind) R ::= mut | immut 
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a, 8,y == (lifetime variable) X,Y ::= (type variable) 
x,y = (variable) f,g ::= (function name) L ::= (label) 
const := n | () bool := unit + unit op ::= Opin | OPpool 

Pin B= +] [oe Poo B= P=] HH | I=] 


Program, Function and Label. A program (denoted by JZ) is a set of function 
definitions. A function definition (F) consists of a function name, a function 
signature and a set of labeled statements (L: S). In COR, for simplicity, the 
input/output types of a function are restricted to pointer types. A function is 
parametrized over lifetime parameters under constraints; polymorphism on types 
is not supported for simplicity, just as Arust. For the lifetime parameter receiver, 
often (ao, +- |) is abbreviated to (ao,...) and (|) is omitted. 

A label (L) is an abstract program point to be jumped to by goto.'? Each 
label is assigned a whole context by the type system, as we see later. This style, 
with unstructured control flows, helps the formal description of CHCs in §3.2. A 
function should have the label entry (entry point), and every label in a function 
should be syntactically reachable from entry by goto jumps.!! 


Statement and Instruction. A statement (S) performs an instruction with a jump 
(I; goto L), returns from a function (return x), or branches (match «a {---}). 

An instruction (J) performs an elementary operation: mutable (re)borrow 
(lety = mutbor, x), releasing a variable (drop x), weakening ownership (immut 
x),'? swap (swap(«x, *y)), creating/dereferencing a pointer (let xy = 2, lety = 
x£), copy (let *y = copy *x),'° type weakening (x as T), function call (lety = 
f(---)(---)), lifetime-related ghost operations (introa, nowa, a < 6; explained 
later), getting a constant / operation result / random integer (let*xy = const / 
xgop*z’ / rand()), creating a variant (let xy = inj?" xx), and creating/destruct- 
ing a pair (letxy = (xag,*a1), let (xyo,*y,) = *a). An instruction of form 
letxy = --- implicitly allocates new memory cells as y; also, some instruc- 
tions deallocate memory cells implicitly. For simplicity, every variable is de- 
signed to be a pointer and every release of a variable should be explicitly an- 
notated by ‘dropz’. In addition, we provide swap instead of assignment; the 
usual assignment (of copyable data from * to *y) can be expressed by let xa’ = 
copy xz; swap(xy, xx’); drop 2’. 


Type. As a type (T), we support recursive types (uX.T), pointer types (PT), 
variant types (To + Tı), pair types (To x Tı) and basic types (int, unit). 

A pointer type PT can be an owning pointer own T (Box<T> in Rust), muta- 
ble reference mut, T (&'a mut T) or immutable reference immut, T (&'a T). An 


10 Tt is related to a continuation introduced by letcont in ARust- 

11 Here ‘syntactically’ means that detailed information such that a branch condition 
on match or non-termination is ignored. 

12 This instruction turns a mutable reference to an immutable reference. Using this, an 
immutable borrow from x to y can be expressed by let y = mutbora x; immut y. 

13 Copying a pointer (an immutable reference) x to y can be expressed by let xox = 
x; letxoy = copy xox; lety = *oy. 
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owning pointer has data in the heap memory, can freely update the data (un- 
less it is borrowed), and has the obligation to clean up the data from the heap 
memory. In contrast, a mutable/immutable reference (or unique/shared refer- 
ence) borrows an update/read permission from an owning pointer or another 
reference with the deadline of a lifetime a (introduced later). A mutable ref- 
erence cannot be copied, while an immutable reference can be freely copied. A 
reference loses the permission at the time when it is released. 14 

A type T that appears in a program (not just as a substructure of some type) 
should satisfy the following condition (if it holds we say the type is complete): 
every type variable X in T is bound by some u and guarded by a pointer con- 
structor (i.e. given a binding of form X.U, every occurrence of X in U is a part 
of a pointer type, of form PU’). 
Lifetime. A lifetime is an abstract time point in the process of computation,'° 
which is statically managed by lifetime variables a. A lifetime variable can be a 
lifetime parameter that a function takes or a local lifetime variable introduced 
within a function. We have three lifetime-related ghost instructions: intro a in- 
troduces a new local lifetime variable, nowa sets a local lifetime variable to 
the current moment and eliminates it, and œ < 6 asserts the ordering on local 
lifetime variables. 


Expressivity and Limitations. COR can express most borrow patterns in the 
core of Rust. The set of moments when a borrow is active forms a continuous 
time range, even under non-lexical lifetimes [54].'° 

A major limitation of COR is that it does not support unsafe code blocks and 
also lacks type traits and closures. Still, our idea can be combined with unsafe 
code and closures, as discussed in §3.5. Another limitation of COR is that, unlike 
Rust and Arust, we cannot directly modify/borrow a fragment of a variable (e.g. 
an element of a pair). Still, we can eventually modify/borrow a fragment by 
borrowing the whole variable and splitting pointers (e.g. ‘let (*yo, *y1) = *x’). 
This borrow-and-split strategy, nevertheless, yields a subtle obstacle when we 
extend the calculus for advanced data types (e.g. get_default in ‘Problem Case 
#3’ from [54]). For future work, we pursue a more expressive calculus modeling 
Rust and extend our verification method to it. 


Example 1 (COR Program). The following program expresses the functions 
take_max and inc_max presented in § 1.2. We shorthand sequential executions 


14 Tn Rust, even after a reference loses the permission and the lifetime ends, its address 
data can linger in the memory, although dereferencing on the reference is no longer 
allowed. We simplify the behavior of lifetimes in COR. 

15 In the terminology of Rust, a lifetime often means a time range where a borrow is 
active. To simplify the discussions, however, we in this paper use the term lifetime 
to refer to a time point when a borrow ends. 

16 Strictly speaking, this property is broken by recently adopted implicit two-phase 
borrows [59,53]. However, by shallow syntactical reordering, a program with implicit 
two-phase borrows can be fit into usual borrow patterns. 
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by ‘5%’ (e.g. Lo: Io; 11; goto Lo stands for Lo: Ip; goto Ly Ly: 1,; goto L2).!” 


H 


fn take-max (a) (ma: muta int, mb: muta int) > muta int { 
entry: let xord = xma >= *mb;'" match xord {inj, xou + goto L2, inj) xou — goto L5} 
L2: drop ou; drop mb; return ma L5:drop ou;° drop ma; return mb 
} 
fn inc-max(oa: own int, ob: own int) —> own bool { 
entry: intro a; let ma = mutbora 0a; let mb = mutbora ob;'3 
let mc = take-max(a)(ma, mb); letxot = 1; letxoc! = *mc + x01; drop 01;" 


L10 


1) 18 119 L11 L12 
swap(mc, oc ); drop oc; drop mc; nowa; let kor = xoa != «ob; 


L13 L14 
drop oa; ~ drop 0b;~ return or 


} 


In take-max, conditional branching is performed by match and its goto directions 
(at L1). In inc-max, increment on the mutable reference mc is performed by 
calculating the new value (at L4, L5) and updating the data by swap (at L7). 

The following is the corresponding Rust program, with ghost annotations 
(marked italic and dark green, e.g. drop ma) on lifetimes and releases of mutable 
references. 


fn take_max<'a>(ma: &'a mut i32, mb: &'a mut i32) -> &'a mut i32 { 
if *ma >= *mb { drop mb; ma } else { drop ma; mb } 
} 
fn inc_max(mut a: i32, mut b: i32) -> bool { 
{ intro 'a; 
let mc = take_max<'a>(&'a mut a, &'a mut b); *mc += 1; 
drop mc; now 'a; } 
a != b 


2.2 Type System 


The type system of COR assigns to each label a whole context (T, A). We define 
below the whole context and the typing judgments. 


Context. A variable context T is a finite set of items of form x:T, where T 
should be a complete pointer type and a (which we call activeness) is of form 
‘active’ or ‘ta’ (frozen until lifetime a). We abbreviate x:°°4’°T as 2:T. A 
variable context should not contain two items on the same variable. A lifetime 
context A = (A, R) is a finite preordered set of lifetime variables, where A is the 
underlying set and R is the preorder. We write |A| and <4 to refer to A and R. 
Finally, a whole context (T, A) is a pair of a variable context T and a lifetime 
context A such that every lifetime variable in I is contained in A. 


17 The first character of each variable indicates the pointer kind (o/m corresponds to 
own/mut,). We swap the branches of the match statement in take-max, to fit the 
order to C/Rust’s if. 
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Notations. The set operation A + B (or more generally 5°, Ay) denotes the 
disjoint union, i.e. the union defined only if the arguments are disjoint. The set 
operation A — B denotes the set difference defined only if A D B. For a natural 
number n, [n] denotes the set {0,...,2—1}. 

Generally, an auxiliary definition for a rule can be presented just below, 
possibly in a dotted box. 


Program and Function. The rules for typing programs and functions are pre- 
sented below. They assign to each label a whole context (T, A). ‘S:7,¢ (T, A) | 
(Tz, Az)z | U’ is explained later. 


for any Fin H, Puy (Tramet rj t: Aname(F),L)L€Labelp 


I: (Tsz, AF, L)G,L) € FnLabel r 
name(F): the function name of F  Labelp: the set of labels in F 
FnLabelz: the set of pairs (f, L) such that a function f in HM has a label L 


F = fn f (ao, ... , Am—1 | Qao < Qbo;- - +, Qazı <ar) (z0: To, ...,2n—-1: In-1) 9U {--+} 
. R + 
Penry = {2 Ti lie [n]} A= {aj|Je[m]} Aey = (A, (Ida {(aa,, ov, )| kE) 
for any L':S E LabelStmt F, Sim, (Tr, Ar) | (Tz, At) reLabelp | U 
F:n (TL, AL) Letabelp 


LabelStmtr: the set of labeled statements in F 
Ida: the identity relation on A R*: the transitive closure of R 


On the rule for the function, the initial whole context at entry is specified 
(the second and third preconditions) and also the contexts for other labels are 
checked (the fourth precondition). The context for each label (in each function) 
can actually be determined in the order by the distance in the number of goto 
jumps from entry, but that order is not very obvious because of unstructured 
control flows. 


Statement. ‘Sırf (T, A) | (Tz, Az)z | U’ means that running the statement S 
(under JI, f) with the whole context (T, A) results in a jump to a label with the 
whole contexts specified by (Tz, Az)z or a return of data of type U. Its rules 
are presented below. ‘I:7,f (T, A) > (I’, A'Y is explained later. 


Tims (T, A) > (Tro, Ato) T = {z:U} _|A| = Aex m,f 
I; goto Lo:n,, (T, A) | (Tz, Az) |U return z:z,ș (T, A)| (Tz, Az)z | U 
Aex m,f: the set of lifetime parameters of f in H 


fi n P (To+T:) Eer 
for i = Od. (Tt, AL, ) = (T-—{zx: P (To+T1)}+{yi: P Ti}, A) 
match «x {inj, *yo > goto Lo, inj, *yı > goto Li}:7,7 (T, A) | (Tz, Az)z | U 


The rule for the return statement ensures that there remain no extra variables 
and local lifetime variables. 


Instruction. ‘I:n, f (T, A) > (T', A’)’ means that running the instruction I (un- 
der IT, f) updates the whole context (T, A) into (T, A’). The rules are designed 
so that, for any I, H, f, (T, A), there exists at most one (I’, A’) such that 
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I:n,¢ (T, A) > (T’, A’) holds. Below we present some of the rules; the complete 
rules are presented in the full paper. The following is the typing rule for mutable 
(re) borrow. 


a¢ Acxm,f P= own,mut, for any £ € Lifetimepr, a <a b 
lety = mutbora w:7,, (T +{£: PT}, A) > (T+{y: muta T, z:t¢ PT}, A) 


Lifetimer: the set of lifetime variables occurring in T 


After you mutably (re)borrow an owning pointer / mutable reference x until a, x 
is frozen until a. Here, a should be a local lifetime variable!’ (the first precondi- 
tion) that does not live longer than the data of x (the third precondition). Below 
are the typing rules for local lifetime variable introduction and elimination. 


introa:n,7 (I, (A, R)) > (T, ({a}+A, {a} x ({a}+Aex m,¢)+R)) 


Q ¢ As. Tf 
now w:77,¢ (T, ({a}+A, R)) > ({thawa (x: T) | x? TET}, (A, {(8, 7) ER | 8#a}) 
zT (a=fa) 
x:AT (otherwise) 


thawa(a:* T) := { 


On introa, it just ensures the new local lifetime variable to be earlier than 
any lifetime parameters (which are given by exterior functions). On nowa, the 
variables frozen with a get active again. Below is the typing rule for dereference 
of a pointer to a pointer, which may be a bit interesting. 


lety = *e:n,¢ (T +{a: P P' T}, A) > (P+{y: (PoP) T}, A) 


` Po own = owno P:= P. Rao Rg := Rg where R” ae (R= R = mut) ' 


The third precondition of the typing rule for mutbor justifies taking just œ in 
the rule ‘Ryo Rh := RW. 


Let us interpret IM: (Tf L, Af,L)(f,L)€FnLabely aS “the program I has the 
type (TfL, AF, ,L)(f,L)€FnLabeln - The type system ensures that any program 
has at most one type (which may be a bit unclear because of unstructured 
control flows). Hereinafter, we implicitly assume that a program has a type. 


2.3 Concrete Operational Semantics 


We introduce for COR concrete operational semantics, which handles a concrete 
model of the heap memory. 
The basic item, concrete configuration C, is defined as follows. 


S := end | [f,L]x,F; S (concrete configuration) C := [f, L] F; S | H 
Here, H is a keap, which maps addresses (represented by integers) to integers 
(data). F is a concrete stack frame, which maps variables to addresses. The stack 


18 In COR, a reference that lives after the return from the function should be cre- 
ated by splitting a reference (e.g. ‘let (*yo, *y1) = *x’) given in the inputs; see also 
Expressivity and Limitations. 
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part of C is of form ‘[f, L| E; [f’, L’] x, F’; --- ; end’ (we may omit the terminator 
‘ end’). [f, L] on each stack frame indicates the program point. ‘a,’ on each non- 
top stack frame is the receiver of the value returned by the function call. 

Concrete operational semantics is characterized by the one-step transition 
relation C +, C’ and the termination relation finaly(C), which can be de- 
fined straightforwardly. Below we show the rules for mutable (re)borrow, swap, 
function call and return from a function; the complete rules and an example 
execution are presented in the full paper. Sr, f, is the statement for the label 
L of the function f in MH. Ty ¢ (£) is the type of variable x at the label. 


Sn,f,L = lety = mutbor, x; goto L) F(x) =a 
[f,L)F;S|H >r [f, 2) F+{y,¢)}; S| H 


Sn,f,L = swap(*x, *y); goto L’ Tynpr(®)=PT F(e)=a Fly) =b 


[f, L] F; S| H+{(atk, me) | kE [#T]}+{(b0+k, ne) | ke [#T]} 
>n [f,L']F; S| H+{(at+k, nx) |ke[#T]}+{(o+k, me) | ke [#T]} 


Sn, = lety = g(---)(x0,---;2n—1); goto L’ 
Sg = (+ +)(#0: To, ..-,2n—1:Tn-1) > U 
[f, L]F+{(2i,a:)|¢€[n]}; S| H >r [gentry] {(z;, ai) |i€[n]}; [f, L] y, F; S | H 
Sr, f, = return £ 
If, L] {(z,a)}; |g, L'a’, F';S | Ho [g, L'] F'+{(2',a)};S | H 


Sirf, = return x 
finaly ([f, L] {(x,a)} | H) 


Here we introduce ‘#7”, which represents how many memory cells the type T 
takes (at the outermost level). #T is defined for every complete type T, because 
every occurrence of type variables in a complete type is guarded by a pointer 
constructor. 


#(To+T1) := 1 + max{#To, #T;} #(To xT) = #To + #Tı 
#uXT:=#TuX.T/X] #it=#PT:=1 #unit=0 


3 CHC Representation of COR Programs 


To formalize the idea discussed in § 1, we give a translation from COR, programs 
to CHC systems, which precisely characterize the input-output relations of the 
COR programs. We first define the logic for CHCs (§ 3.1). We then formally 
describe our translation (§3.2) and prove its correctness (§3.3). Also, we examine 
effectiveness of our approach with advanced examples (§ 3.4) and discuss how 
our idea can be extended and enhanced (§ 3.5). 


3.1 Multi-sorted Logic for Describing CHCs 


To begin with, we introduce a first-order multi-sorted logic for describing the 
CHC representation of COR programs. 
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Syntax. The syntax is defined as follows. 


(CHC) @ ::= Vaxo:00,...;%@m-1:0m-1- Ø <= Wo Avs: A Yn- 
T := the nullary conjunction of formulas 
(formula) p, Y ::= f(to,...,tn-1) (elementary formula) % ::= f(po,..-,Pn—1) 
(term) t = a | (t) | (ts,to) | inj, t | (to,t1) | xt | ot | t.i | const | t opt 


(value) v,w ::= (v) | (Ux, Vo) | inj; uv | (vo, v1) | const 
(pattern) p,q := x | (p) | (px,Po) | inj; p | (po, pi) | const 
(sort) o,r n= X | uX.o | Co | oo +01 | oo X a1 | int | unit 
(container kind) C ::= box | mut const ::= same as COR op ::= same as COR 


bool := unit + unit true := inj, () false := inj, () 
X ::= (sort variable) «x,y ::= (variable) f ::= (predicate variable) 


We introduce boxa and muto, which correspond to ownT'/immut, T and 
mut, T respectively. (t)/(t.,to) is the constructor for boxo/muta. *t takes the 
body /first value of (—) /(—,—) and ot takes the second value of (—,—). We restrict 
the form of CHCs here to simplify the proofs later. Although the logic does not 
have a primitive for equality, we can define the equality in a CHC system (e.g. 
by adding V z: ø. Eq(x,x) <= T). 

A CHC system (®,&) is a pair of a finite set of CHCs ® = {®p,...,Pp_1} 
and &, where & is a finite map from predicate variables to tuples of sorts (denoted 
by £), specifying the sorts of the input values. Unlike the informal description 
in §1, we add & to a CHC system. 


Sort System. ‘t:, o’ (the term t has the sort ø under A) is defined as follows. 
Here, A is a finite map from variables to sorts. 0 ~ T is the congruence on sorts 
induced by pX.0 ~ o[uX.o0/X]. 


A(x) =o tia o tx, toia O tia 0; to:a coo tiia cı 
TACT (t):a boxo (tx, toja muto inj;t:a oo +01 (to,ti):a a0 X 01 
taCo t:amuto tiaoo+toar t,t’: int tac ONT 
: COnSt:A Oconst 
xt: A o ot:a o t.i A Oj t op tA Cop tia T 


Oconst: the sort of const Oop: the output sort of op 


‘wellSorteda =(y)’ and ‘wellSorteds=(@)’, the judgments on well-sortedness 
of formulas and CHCs, are defined as follows. 


E(f) = (00,.--,0n-1) for any i€ [n], tia ci 
wellSorteda =(f(to,---,tn—1)) 


A = {(xi, 0:1) | i€[m]}  wellSorteda,=(¢) for any j € [n], wellSorteda = (%3) 
wellSortedz (V0: 00, ...,@m—1!0m-1. Ø — p AAA Wn-1) 


The CHC system (®, =) is said to be well-sorted if wellSorteds(®) holds for any 
PES. 


Semantics. ‘[t],’, the interpretation of the term t as a value under I, is defined 
as follows. Here, I is a finite map from variables to values. Although the definition 


RustHorn: CHC-based Verification for Rust Programs 497 


is partial, the interpretation is defined for all well-sorted terms. 
[el = 1e) Koh = Elo Kt toli = (lr [tolp [inj tli := inj: [tlr 
_ afe (th = ©) ee ee an 
[toothy = (Uo bs) Bethe = {2 hO ftp = vo iE ely = (vaso) 
[t-il := vi if [t]; =(vo,v1) [const], := const [top t’], := [4], [op] [¢’], 
[op]: the binary operation on values corresponding to op 


A predicate structure M is a finite map from predicate variables to (concrete) 
predicates on values. M, I = f(to,...,tn—1) means that M(f)([fo];, ---, [bn—1],) 
holds. M f= @ is defined as follows. 


for any I s.t. Vie fm]. I(xi):ø oi, M, ITE wo,...,tn—1 implies M, I H ¢ 


Finally, M E (®,&) is defined as follows. 


for any (f,(00,..-,0n—1)) € &, M(f) is a predicate on values of sort oo, ...,On—1 
domM=dome for any € ®,ME@ 
ME (8,8) 


When M H= (®, 2) holds, we say that M is a model of (®,&). Every well- 
sorted CHC system (®, =) has the least model on the point-wise ordering (which 
can be proved based on the discussions in [16]), which we write as Ms): 


3.2 Translation from COR Programs to CHCs 


Now we formalize our translation of Rust programs into CHCs. We define (JI), 
which is a CHC system that represents the input-output relations of the functions 
in the COR program JI. 

Roughly speaking, the least model M!*s' for this CHC system should sat- 


(D 
isfy: for any values vo, ... , Un—1,;, W, Moa H fentry (V0, - - - , Un—1, w) holds exactly 
if, in COR, a function call f(vo,...,Un—1) can return w. Actually, in concrete 


operational semantics, such values should be read out from the heap memory. 
The formal description and proof of this expected property is presented in § 3.3. 


Auxiliary Definitions. The sort corresponding to the type T, (T), is defined 
as follows. P is a meta-variable for a non-mutable-reference pointer kind, i.e. 
own or immut,. Note that the information on lifetimes is all stripped off. 


(X):=X (uX.T) =pX4T) (PT) :=box(T) (muta T) := mut (T) 
(int) := int (unit) := unit (Zo+Ti) := (Zo) + (T1) (Tox Ti) := (To) x (Di) 


We introduce a special variable res to represent the result of a function. 1° For 
a label L in a function f in a program I, we define Yr,,,, =,f,, and Az, f,1 


19 For simplicity, we assume that the parameters of each function are sorted respecting 
some fixed order on variables (with res coming at the last), and we enumerate various 
items in this fixed order. 
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as follows, if the items in the variable context for the label are enumerated as 
xo:™ To,..-,%n—1:°"-' Tnh—1 and the return type of the function is U. 


PIT, f,L = fr (xo, .. -3 n—-l1; res) EMAL := ((To)), TET (Tn-1)), (UD) 
An,f,L = { (x (T:)) | i € [n]} + {(res, (UD)} 
V(A) stands for V zo: 0, ..., En—1:On—1, Where the items in A are enumerated 
as (x0, o0), eani (Sut nai). 


CHC Representation. Now we introduce ‘(L: S) p’, the set (in most cases, 
singleton) of CHCs modeling the computation performed by the labeled state- 
ment L: S in f from H. Unlike informal descriptions in § 1, we turn to pattern 
matching instead of equations, to simplify the proofs. Below we show some of the 
rules; the complete rules are presented in the full paper. The variables marked 
green (e.g. £o) should be fresh. The following is the rule for mutable (re)borrow. 


(L: let y = mutbor, x; goto L'as 
[V Ana Hia (T))}). \ 


) 
ğun, = Pr g*r, Lo) /Y, (Lo) / a] 
) 


[ 
W(Ans1+{(eo, (T))}). esd 
ee ciety ee =at) 


(Tyr, 5,1 (2) = own T) 


The value at the end of borrow is represented as a newly introduced variable xo. 
Below is the rule for release of a variable. 


(L: drop z; goto L') y p 
{V(An,s,)- Žu, si = Or,5,1' } (Tyn, s1(2) = PT) 


= 4 [ Ans- {(2, mut (T))}+{ (e (TDH. E 
{ ene epee Pees \ age) mee) 


When a variable x of type mut, T is dropped/released, we check the prophesied 
value at the end of borrow. Below is the rule for a function call. 


(L: let y = g(---)(t0,...,2n—1)} goto L' |), 
= {V(An, f +{(y, (TY rn pc (Y)))}) n, f, = Gentry(®o,---,2n-1,y) A Sage} 


The body (the right-hand side of <—) of the CHC contains two formulas, which 
yields a kind of call stack at the level of CHCs. Below is the rule for a return 
from a function. 


(L: return x) 7 p := {V(An,7,7). Gr,f,z["/res] —= T } 


The variable res is forced to be equal to the returned variable z. 
Finally, (Z), the CHC system that represents the COR program J (or the 
CHC representation of IT), is defined as follows. 


(ZZ) = (rin I, L:S € LabelStmt p (L: Or nameri (En,f,L) fr s.t. (f,L) € FnLabelr ) 


Example 2 (CHC Representation). We present below the CHC representation 
of take-max described in § 2.1. We omit CHCs on inc-max here. We have also 
excluded the variable binders ‘V -- -’.?° 


take-maxentry(ma, mb, res) <== take-max.i(ma, mb, (xma >=*mb}, res) 


20 The sorts of the variables are as follows: ma, mb, res: mut int; mas, mb: int; ou: box unit. 
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take-maxii(ma, mb, (inj,*ou), res) <== take-max.2(ma, mb, ou, res) 

take-maxii(ma, mb, (injy*ou), res) <== take-maxı5s (ma, mb, ou, res) 
take-maxi2(ma, mb, ou, res) <== take-maxi3(ma, mb, res) 
take-max.3(ma, (mb.,mb.), res) <== take-max,4(ma, res) 

take-maxia(ma, ma) <= T 

take-maxis(ma, mb, ou, res) <== take-maxi6(ma, mb, res) 

take-maxi6((max,max), mb, res) <= take-max,7(mb, res) 

take-max_7(mb, mb) <= 


The fifth and eighth CHC represent release of mb/ma. The sixth and ninth CHC 
represent the determination of the return value res. 


3.3 Correctness of the CHC Representation 


Now we formally state and prove the correctness of the CHC representation. 


Notations. We use {|---|} (instead of {---}) for the intensional description of 
a multiset. A © B (or more generally ®, Ay) denotes the multiset sum (e.g. 


{]0, 1} ® {1 = {]0, 1, 1} A {0, 13). 


Readout and Safe Readout. We introduce a few judgments to formally de- 
scribe how read out data from the heap. 

First, the judgment ‘readouty(*a:: T | v; MY (the data at the address a of 
type T can be read out from the heap H as the value v, yielding the memory 
footprint M) is defined as follows.?! Here, a memory footprint M is a finite 
multiset of addresses, which is employed for monitoring the memory usage. 

H(a) =a readouty(*a’:: T |v; M)  readouty(*a:: T[wX.T/X] | v; M) 
readoutu(*a: own T | (w); M@{a}}) readouty(*a:: uX.T/X | v; M) 
H(a) =n 
readouty («a :: int | n; {ja}}) 
H(a) =7€ [2] for any ke |(#71i-#7;)>0], H(a+1+#T:+k) =0 
readouty(*(a+1):: T; | v; M) 
readoutu (*a :: To+T, | inj, v; Me {a} e{la+1+¥4T; +k | ke [(#TH #T;)>o]}) 
(n)>0 := max{n, 0} 
readoutu (xa :: To | vo; Mo) readout (*(a+#To) 2 Ty | ur; M1) 
readout (*a:: To XTi | (vo, v1); Mo®M1) 


For example, ‘readout (10,7) ,(101,5)} (#100 :: int x int | (7,5); {]100, 101|})’ holds. 


readouty(*a:: unit | (); Ø) 


Next, ‘readouty(F:: T | F; MY (the data of the stack frame F respecting 
the variable context I can be read out from H as F, yielding M) is defined as 
follows. domT stands for {x | #:*° TET}. 

dom F =domT for any z:ownT €T, readout (*F(x):: T | vz; Mz) 
readout (F :: I | {(x, (vz))| x Edom F}; rcaomr Mz) 


21 Here we can ignore mutable/immutable references, because we focus on what we call 
simple functions, as explained later. 
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Finally, ‘safey(F:: T | FY (the data of F respecting I can be safely read 
out from H as F) is defined as follows. 


readoutu(F:: T | F; M) M has no duplicate items 
safey(F:: T | F) 


Here, the ‘no duplicate items’ precondition checks the safety on the ownership. 
y 


COS-based Model. Now we introduce the COS-based model (COS stands for 
concrete operational semantics) JESS to formally describe the expected input- 
output relation. Here, for simplicity, f is restricted to one that does not take 
lifetime parameters (we call such a function simple; the input/output types 
of a simple function cannot contain references). We define fẸOS as the pred- 
icate (on values of sorts (To), ...,(Zn-1), (U) if f’s input/output types are 
To, .--,Tn—-1, U) given by the following rule. 


Co =y Sa Cn finalz (Cy) Co = [f, entry] F | H Cn = be L] F’ | H’ 
safen (F :: Dr, fenry | {(xi, vi) |4€[n]}) safer (F: P'z,p,x | {(y, w)}) 
FEOF (v0, .--, Un—1, w) 


I, 7,L: the variable context for the label L of f in the program H 


Correctness Theorem. Finally, the correctness (both soundness and com- 
pleteness) of the CHC representation is simply stated as follows. 


Theorem 1 (Correctness of the CHC Representation). For any program 
cos 


IT and simple function f in II, f°? is equivalent to Moa (Tenny): 
Proof. The details are presented in the full paper. We outline the proof below. 

First, we introduce abstract operational semantics, where we get rid of heaps 
and directly represent each variable in the program simply as a value with ab- 
stract variables, which is strongly related to prophecy variables (see § 5). An 
abstract variable represents the undetermined value of a mutable reference at 
the end of borrow. 

Next, we introduce SLDC resolution for CHC systems and find a bisimula- 
tion between abstract operational semantics and SLDC resolution, whereby we 
show that the AOS-based model, defined analogously to the COS-based model, 
is equivalent to the least model of the CHC representation. Moreover, we find 
a bisimulation between concrete and abstract operational semantics and prove 
that the COS-based model is equivalent to the AOS-based model. 

Finally, combining the equivalences, we achieve the proof for the correctness 
of the CHC representation. 


Interestingly, as by-products of the proof, we have also shown the soundness 
of the type system in terms of preservation and progression, in both concrete and 
abstract operational semantics. Simplification and generalization of the proofs 
is left for future work. 
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3.4 Advanced Examples 


We give advanced examples of pointer-manipulating Rust programs and their 
CHC representations. For readability, we write programs in Rust (with ghost 
annotations) instead of COR. In addition, CHCs are written in an informal style 
like § 1, preferring equalities to pattern matching. 


Example 3. Consider the following program, a variant of just_rec in § 1.1. 


fn choose<'a>(ma: &'a mut i32, mb: &'a mut i32) -> &'a mut i32 { 
if rand() { drop ma; mb } else { drop mb; ma } 

} 

fn linger_dec<'a>(ma: &'a mut i32) -> bool { 
xma -= 1; if rand() >= O { drop ma; return true; } 
let mut b = rand(); let old_b = b; intro 'b; let mb = &'b mut b; 
let r2 = linger_dec<'b>(choose<'b>(ma, mb)); now 'b; 
r2 && old_b >= b 

} 


Unlike just_rec, the function linger_dec can modify the local variable of an 
arbitrarily deep ancestor. Interestingly, each recursive call to linger_dec can 
introduce a new lifetime 'b, which yields arbitrarily many layers of lifetimes. 
Suppose we wish to verify that linger_dec never returns false. If we use, 
like JustRec, in § 1.1, a predicate taking the memory states h, h’ and the stack 
pointer sp, we have to discover the quantified invariant: Vi < sp. hļi] > h’[i]. In 
contrast, our approach reduces this verification problem to the following CHCs: 


Choose((a, ao), (b, bo), r) == bo =b A r = (a, ao) 

Choose((a, ao), (b, bo), r) = ao =a A r= (b, bo) 

LingerDec(({a, ao), r) a’ =a-1Aa.=a A r= true 

LingerDec((a,ac),r) =a’ =a—1 A oldb =b A Choose((a’, ac), (b, bo), mc) 
A LingerDec(me,r’) A r = (r' && oldb >= bo) 

r=true <= LingerDec((a,ao),1r). 


This can be solved by many solvers since it has a very simple model: 
Choose((a,do),(b, bo), r) : (bb =b A r = (4,d0)) V (ao =a A r= (b, bo)) 
LingerDec((a,ao),r) <= r= true A a > ao. 


Example 4. Combined with recursive data structures, our method turns out to 
be more interesting. Let us consider the following Rust code:?? 


enum List { Cons(i32, Box<List>), Nil } use List::*; 
fn take_some<'a>(mxs: &'a mut List) -> &'a mut i32 { 
match mxs { 
Cons(mx, mxs2) => if rand() { drop mzs2; mx } 
else { drop mz; take_some<'a>(mxs2) } 
Nil => { take_some(mxs) } 


22 In COR, List can be expressed as uX.int x own X + unit. 
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} 
} 
fn sum(xs: &List) -> i32 { 
match xs { Cons(x, xs2) => x + sum(xs2), Nil => 0 } 
} 
fn inc_some(mut xs: List) -> bool { 
let n = sum(&xs); intro 'a; let my = take_some<'a>(&'a mut xs); 
amy += 1; drop my; now 'a; let m = sum(&xs); m == n + 1 


} 


This is a program that manipulates singly linked integer lists, defined as a re- 
cursive data type. take_some takes a mutable reference to a list and returns 
a mutable reference to some element of the list. sum calculates the sum of the 
elements of a list. inc_some increments some element of a list via a mutable 
reference and checks that the sum of the elements of the list has increased by 1. 

Suppose we wish to verify that inc_some never returns false. Our method 
translates this verification problem into the following CHCs.”° 


TakeSome(([a|xs"],280),7) <= Tso = [xo|as6] A xsi = as’ A r = (a, £o) 


([2|2s'] l 
TakeSome((|x|zs"], £80), r) <= 28 = [zo|£s4] A £o =x A TakeSome((xs', £84), r) 
TakeSome(([|], zso0)},r) <= TakeSome(([], zso},r) 

Sum(([a|xs"]), r) Sum((2s'),r') Nr =a+r' 

Sum(([]),r) <= r=0 


IncSome(as,r) <= Sum((as),n) A TakeSome((xs, 280), (y,Yo)) A Yo=ytl1 
A Sum((tso),m) A r = (m==n+1). 


A crucial technique used here is subdivision of a mutable reference, which is 
achieved with the constraint xs, = [xo|xs4]. 

We can give this CHC system a very simple model, using an auxiliary function 
sum (satisfying sum(([z|2s’]) := x + sum(xs’), sum([]) := 0): 

TakeSome((28, £50), (Y, Yo)) <=> Yo — y = sum(zso) — sum(zs) 
Sum((as),r) :<— r = sum(zs) 
IncSome(as,r) :4— r = true. 

Although the model relies on the function sum, the validity of the model can be 
checked without induction on sum (i.e. we can check the validity of each CHC 
just by properly unfolding the definition of sum a few times). 

The example can be fully automatically and promptly verified by our approach 
using Holce [12,11] as the back-end CHC solver; see § 4. 


3.5 Discussions 


We discuss here how our idea can be extended and enhanced. 


23 [z|as] is the cons made of the head 2 and the tail zs. [] is the nil. In our formal logic, 
they are expressed as inj, (x, (as)) and inj, (). 
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Applying Various Verification Techniques. Our idea can also be expressed as a 
translation of a pointer-manipulating Rust program into a program of a stateless 
functional programming language, which allows us to use various verification 
techniques not limited to CHCs. Access to future information can be modeled 
using non-determinism. To express the value a, coming at the end of mutable 
borrow in CHCs, we just randomly guess the value with non-determinism. At 
the time we actually release a mutable reference, we just check a' = a and cut 
off execution branches that do not pass the check. 

For example, take_max/inc_max in § 1.2/Example 1 can be translated into 
the following OCaml program. 


let rec assume b = if b then () else assume b 
let take_max (a, a') (b, b') = 
if a >= b then (assume (b' = b); (a, a')) 
else (assume (a' = a); (b, b')) 
let inc_max a b = 
let a' = Random.int(0) in let b' = Random.int(0) in 
let (c, c') = take_max (a, a') (b, b') in 
assume (c' = c+ 1); not (a' = b') 
let main a b = assert (inc_max a b) 


‘let a' = Random.int(0)’ expresses a random guess and ‘assume (a' = a)’ 
expresses a check. The original problem “Does inc_max never return false?” 
is reduced to the problem “Does main never fail at assertion?” on the OCaml 
program.** 

This representation allows us to use various verification techniques, including 
model checking (higher-order, temporal, bounded, etc.), semi-automated verifi- 
cation (e.g. on Boogie [48]) and verification on proof assistants (e.g. Coq [15]). 
The property to be verified can be not only partial correctness, but also total 
correctness and liveness. Further investigation is left for future work. 


Verifying Higher-order Programs. We have to care about the following points in 
modeling closures: (i) A closure that encloses mutable references can be encoded 
as a pair of the main function and the ‘drop function’ called when the closure is 
released; (ii) A closure that updates enclosed data can be encoded as a function 
that returns, with the main return value, the updated version of the closure; 
(iii) A closure that updates external data through enclosed mutable references 
can also be modeled by combination of (i) and (ii). Further investigation on 
verification of higher-order Rust programs is left for future work. 


Libraries with Unsafe Code. Our translation does not use lifetime information; 
the correctness of our method is guaranteed by the nature of borrow. Whereas 


24 MoCHi [39], a higher-order model checker for OCaml, successfully verified the safety 
property for the OCaml representation above. It also successfully and instantly ver- 
ified a similar representation of choose/linger_dec at Example 3. 
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lifetimes are used for static check of the borrow discipline, many libraries in Rust 
(e.g. RefCell) provide a mechanism for dynamic ownership check. 

We believe that such libraries with unsafe code can be verified for our method 
by a separation logic such as Iris [35,33], as RustBelt [32] does. A good news 
is that Iris has recently incorporated prophecy variables [34], which seems to fit 
well with our approach. This is an interesting topic for future work. 

After the libraries are verified, we can turn to our method. For an easy 
example, Vec [58] can be represented simply as a functional array; a muta- 
ble/immutable slice &mut[T]/&[T] can be represented as an array of muta- 
ble/immutable references. For another example, to deal with RefCe1l1 [56], we 
pass around an array that maps a RefCe11<T> address to data of type T equipped 
with an ownership counter; RefCel1 itself is modeled simply as an address.?°7° 
Importantly, at the very time we take a mutable reference (a, a.) from a ref-cell, 
the data at the array should be updated into as. Using methods such as pointer 
analysis [61], we can possibly shrink the array. 

Still, our method does not go quite well with memory leaks [52] caused for 
example by combination of RefCell and Rc [57], because they obfuscate the 
ownership release of mutable references. We think that use of Rc etc. should 
rather be restricted for smooth verification. Further investigation is needed. 


4 Implementation and Evaluation 


We report on the implementation of our verification tool and the preliminary 
experiments conducted with small benchmarks to confirm the effectiveness of 
our approach. 


4.1 Implementation of RustHorn 


We implemented a prototype verification tool RustHorn (available at https: 
//github.com/hopv/rust-horn) based on the ideas described above. The tool 
supports basic features of Rust supported in COR, including recursions and 
recursive types especially. 

The implementation translates the MIR (Mid-level Intermediate Representa- 
tion) [45,51] of a Rust program into CHCs quite straightforwardly.?’ Thanks to 
the nature of the translation, RustHorn can just rely on Rust’s borrow check and 
forget about lifetimes. For efficiency, the predicate variables are constructed by 
the granularity of the vertices in the control-flow graph in MIR, unlike the per- 
label construction of § 3.2. Also, assertions in functions are taken into account 
unlike the formalization in § 3.2. 


25 To borrow a mutable/immutable reference from RefCell, we check and update the 
counter and take out the data from the array. 

26 In Rust, we can use RefCell to naturally encode data types with circular references 
(e.g. doubly-linked lists). 

27 Tn order to use the MIR, RustHorn’s implementation depends on the unstable nightly 
version of the Rust compiler, which causes a slight portability issue. 


RustHorn: CHC-based Verification for Rust Programs 505 


4.2 Benchmarks and Experiments 


To measure the performance of RustHorn and the existing CHC-based verifier 
SeaHorn [23], we conducted preliminary experiments with benchmarks listed in 
Table 1. Each benchmark program is designed so that the Rust and C versions 
match. Each benchmark instance consists of either one program or a pair of safe 
and unsafe programs that are very similar to each other. The benchmarks and 
experimental results are accessible at https: //github.com/hopv/rust-horn. 

The benchmarks in the groups simple and bmc were taken from SeaHorn 
(https: //github.com/seahorn/seahorn/tree/master/test), with the Rust 
versions written by us. They have been chosen based on the following criteria: 
they (i) consist of only features supported by core Rust, (ii) follow Rust’s owner- 
ship discipline, and (iii) are small enough to be amenable for manual translation 
from C to Rust. 

The remaining six benchmark groups are built by us and consist of programs 
featuring mutable references. The groups inc-max, just-rec and linger-dec 
are based on the examples that have appeared in § 1 and § 3.4. The group 
swap-dec consists of programs that perform repeated involved updates via mu- 
table references to mutable references. The groups lists and trees feature 
destructive updates on recursive data structures (lists and trees) via mutable 
references, with one interesting program of it explained in § 3.4. 

We conducted experiments on a commodity laptop (2.6GHz Intel Core i7 
MacBook Pro with 16GB RAM). First we translated each benchmark program 
by RustHorn and SeaHorn (version 0.1.0-rc3) [23] translate into CHCs in the 
SMT-LIB 2 format. Both RustHorn and SeaHorn generated CHCs sufficiently 
fast (about 0.1 second for each program). After that, we measured the time of 
CHC solving by Spacer [40] in Z3 (version 4.8.7) [69] and Holce (version 1.8.1) 
12,11] for the generated CHCs. SeaHorn’s outputs were not accepted by Holce, 
especially because SeaHorn generates CHCs with arrays. We also made modified 
versions for some of SeaHorn’s CHC outputs, adding constraints on address 
freshness, to improve accuracy of representations and reduce false alarms.?° 


4.3 Experimental Results 


Table 1 shows the results of the experiments. 

Interestingly, the combination of RustHorn and Holce succeeded in verify- 
ing many programs with recursive data types (lists and trees), although it 
failed at difficult programs.?? Holce, unlike Spacer, can find models defined with 
primitive recursive functions for recursive data types.” 


28 For base/3 and repeat/3 of inc-max, the address-taking parts were already removed, 
probably by inaccurate pointer analysis. 

29 For example, inc-some/2 takes two mutable references in a list and increments on 
them; inc-all-t destructively increments all elements in a tree. 

30 We used the latest version of Holce, whose algorithm for recursive types is presented 
in the full paper of [11]. 
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RustHorn SeaHorn w/Spacer 
Group Instance Property| w/Spacer w/Holce as is modified 
01 safe <0.1 <0.1 <0.1 
04-recursive safe 0.5 timeout 0.8 
simple 05-recursive unsafe <0.1 <0.1 <0.1 
06-loop safe timeout 0.1 timeout 
hhk2008 safe timeout 40.5 <0.1 
unique-scalar unsafe <0.1 <0.1 <0.1 
1 safe 0.2 <0.1 <0.1 
unsafe 0.2 <0.1 <0.1 
2 safe timeout 0.1 <0.1 
unsafe <0.1 <0.1 <0.1 
bne 3 safe <0.1 <0.1 <0.1 
ariniy Fea a 20 
x safe À <0. <U. 
dramond=1 unsafe <0.1 <0.1 <0.1 
; safe 0.2 <0.1 <0.1 
qianond=2 unsafe <0.1 <0.1 <0.1 
bade safe <0.1 <0.1 |false alarm <0.1 
unsafe <0.1 <0.1 <0.1 <0.1 
safe <01 <0.1 |false alarm 
: base/3 unsafe 0.1 <0.1 <0.1 
inc-max t safe 0.1 timeout |false alarm 0.1 
Spaa unsafe <0.1 0.4 <0.1 <0.1 
safe 0.2 timeout <0.1 
repeat /3 unsafe <0.1 1.3 <0.1 
faze safe <0.1 <0.1 false alarm <0.1 
unsafe 0.1 timeout <0.1 <0.1 
safe 0.2 timeout |false alarm <0.1 
base/3 unsafe 0.4 0.9 <0.1 0.1 
swap-dec A safe 0.1 0.5 |false alarm timeout 
exac unsafe <0.1 26.0 <0.1 <0.1 
t/3 safe timeout timeout |false alarm false alarm 
SxAG unsafe <0.1 0.4 <0.1 <0.1 
: safe <0.1 <0.1 <0.1 
Just=ree Dae unsafe <0.1 0.1 <0.1 
safe <0.1 <0.1 {false alarm 
hase unea e <0.1 0.1 j <0.1 
safe <0.1 <0.1 |false alarm 
i base/3 unsafe <0.1 7.0 <0.1 
linger-dec ra safe <0.1 <0.1 |false alarm 
ezac unsafe <0.1 0.2 <0.1 
safe <0.1 <0.1 |false alarm 
exact/3 unsafe <0.1 0.6 <0.1 
anpand safe |tool error <0.1 |false alarm 
PP unsafe |tool error 0.2 0.1 
inc-all safe tool error <0.1 |false alarm 
' unsafe |tool error 0.3 <0.1 
lists fee safe |tool error <0.1 |false alarm 
THeseome unsafe |tool error 0.3 0.1 
inc-some/2 safe |tool error timeout |false alarm 
unsafe [tool error 0.3 0.4 
d-t safe |tool error <0.1 timeout 
appen unsafe |tool error 0.3 0.1 
inceali-t safe |tool error timeout | timeout 
unsafe |tool error 0.1 <0.1 
trees EENE -t safe |tool error timeout | timeout 
Ther eons unsafe |tool error 0.3 0.1 
f safe |tool error timeout |false alarm 
inc-some/2-t Se 
unsafe |tool error 


Table 1. Benchmarks and experimental results on RustHorn and SeaHorn, with 
Spacer/Z3 and Holce. “timeout” denotes timeout of 180 seconds; “false alarm” means 
reporting ‘unsafe’ for a safe program; “tool error” is a tool error of Spacer, which 
currently does not deal with recursive types well. 
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False alarms of SeaHorn for the last six groups are mainly due to problematic 
approximation of SeaHorn for pointers and heap memories, as discussed in § 1.1. 
On the modified CHC outputs of SeaHorn, five false alarms were erased and four 
of them became successful. For the last four groups, unboundedly many mem- 
ory cells can be allocated, which imposes a fundamental challenge for SeaHorn’s 
array-based approach as discussed in § 1.1.°! The combination of RustHorn and 
Holce took a relatively long time or reported timeout for some programs, includ- 
ing unsafe ones, because Holce is still an unstable tool compared to Spacer; in 
general, automated CHC solving can be rather unstable. 


5 Related Work 


CHC-based Verification of Pointer-Manipulating Programs. SeaHorn [23] is a 
representative existing tool for CHC-based verification of pointer-manipulating 
programs. It basically represents the heap memory as an array. Although some 
pointer analyses [24] are used to optimize the array representation of the heap, 
their approach suffers from the scalability problem discussed in §1.1, as confirmed 
by the experiments in § 4. Still, their approach is quite effective as automated 
verification, given that many real-world pointer-manipulating programs do not 
follow Rust-style ownership. 

Another approach is taken by JayHorn [37,36], which translates Java pro- 
grams (possibly using object pointers) to CHCs. They represent store invariants 
using special predicates pull and push. Although this allows faster reasoning 
about the heap than the array-based approach, it can suffer from more false 
alarms. We conducted a small experiment for JayHorn (0.6-alpha) on some of 
the benchmarks of § 4.2; unexpectedly, JayHorn reported ‘UNKNOWN’ (instead of 
‘SAFE’ or ‘UNSAFE’) for even simple programs such as the programs of the instance 
unique-scalar in simple and the instance basic in inc-max. 


Verification for Rust. Whereas we have presented the first CHC-based (fully au- 
tomated) verification method specially designed for Rust-style ownership, there 
have been a number of studies on other types of verification for Rust. 

RustBelt [32] aims to formally prove high-level safety properties for Rust 
libraries with unsafe internal implementation, using manual reasoning on the 
higher-order concurrent separation logic Iris [35,33] on the Coq Proof Assistant 
[15]. Although their framework is flexible, the automation of the reasoning on 
the framework is little discussed. The language design of our COR is affected by 
their formal calculus Arust- 

Electrolysis [67] translates some subset of Rust into a purely functional pro- 
gramming language to manually verify functional correctness on Lean Theorem 
Prover [49]. Although it clears out pointers to get simple models like our ap- 
proach, Electrolysis’ applicable scope is quite limited, because it deals with mu- 
table references by simple static tracking of addresses based on lenses [20], not 


31 We also tried on Spacer JustRec,, the stack-pointer-based accurate representation 
of just_rec presented in § 1.1, but we got timeout of 180 seconds. 
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supporting even basic use cases such as dynamic selection of mutable references 
(e.g. take_max in § 1.2) [66], which our method can easily handle. Our approach 
covers all usages of pointers of the safe core of Rust as discussed in § 3. 

Some serial studies [27,3,17] conduct (semi-)automated verification on Rust 
programs using Viper [50], a verification platform based on separation logic with 
fractional ownership. This approach can to some extent deal with unsafe code 
[27] and type traits [17]. Astrauskas et al. [3] conduct semi-automated verifi- 
cation (manually providing pre/post-conditions and loop invariants) on many 
realistic examples. Because Viper is based on fractional ownership, however, 
their platforms have to use concrete indexing on the memory for programs like 
take_max/inc_max. In contrast, our idea leverages borrow-based ownership, and 
it can be applied also to semi-automated verification as suggested in § 3.5. 

Some researches [65,4,44] employ bounded model checking on Rust programs, 
especially with unsafe code. Our method can be applied to bounded model check- 
ing as discussed in § 3.5. 


Verification using Ownership. Ownership has been applied to a wide range of 
verification. It has been used for detecting race conditions on concurrent pro- 
grams [8,64] and analyzing the safety of memory allocation [63]. Separation logic 
based on ownership is also studied well [7,50,35]. Some verification platforms 
[14,5,21] support simple ownership. However, most prior studies on ownership- 
based verification are based on fractional or counting ownership. Verification 
under borrow-based ownership like Rust was little studied before our work. 


Prophecy Variables. Our idea of taking a future value to represent a mutable 
reference is linked to the notion of prophecy variables [1,68,34]. Jung et al. [34] 
propose a new Hoare-style logic with prophecy variables. In their logic, prophecy 
variables are not copyable, which is analogous to uncopyability of mutable ref- 
erences in Rust. This logic can probably be used for generalizing our idea as 
suggested in § 3.5. 


6 Conclusion 


We have proposed a novel method for CHC-based program verification, which 
represents a mutable reference as a pair of values, the current value and the 
future value at the time of release. We have formalized the method for a core 
language of Rust and proved its correctness. We have implemented a proto- 
type verification tool for a subset of Rust and confirmed the effectiveness of our 
approach. We believe that this study establishes the foundation of verification 
leveraging borrow-based ownership. 
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Abstract. We propose a novel logic, called Frame Logic (FL), that ex- 
tends first-order logic (with recursive definitions) using a construct Sp(-) 
that captures the implicit supports of formulas— the precise subset of 
the universe upon which their meaning depends. Using such supports, we 
formulate proof rules that facilitate frame reasoning elegantly when the 
underlying model undergoes change. We show that the logic is expressive 
by capturing several data-structures and also exhibit a translation from 
a precise fragment of separation logic to frame logic. Finally, we design 
a program logic based on frame logic for reasoning with programs that 
dynamically update heaps that facilitates local specifications and frame 
reasoning. This program logic consists of both localized proof rules as 
well as rules that derive the weakest tightest preconditions in FL. 


Keywords: Program Verification, Program Logics, Heap Verification, First- 
Order Logic, First-Order Logic with Recursive Definitions 


1 Introduction 


Program logics for expressing and reasoning with programs that dynamically 
manipulate heaps is an active area of research. The research on separation logic 
has argued convincingly that it is highly desirable to have localized logics that 
talk about small states (heaplets rather than the global heap), and the ability 
to do frame reasoning. Separation logic achieves this objective by having a tight 
heaplet semantics and using special operators, primarily a separating conjunction 
operator * and a separating implication operator (the magic wand —»). 

In this paper, we ask a fundamental question: can classical logics (such as 
FOL and FOL with recursive definitions) be extended to support localized spec- 
ifications and frame reasoning? Can we utilize classical logics for reasoning effec- 
tively with programs that dynamically manipulate heaps, with the aid of local 
specifications and frame reasoning? 

The primary contribution of this paper is to endow a classical logic, namely 
first-order logic with recursive definitions (with least fixpoint semantics) with 
frames and frame reasoning. 
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A formula in first-order logic with recursive definitions (FO-RD) can be nat- 
urally associated with a support— the subset of the universe that determines 
its truth. By using a more careful syntax such as guarded quantification (which 
continue to have a classical interpretation), we can in fact write specifications in 
FO-RD that have very precise supports. For example, we can write the property 
that x points to a linked list using a formula list(x) written purely in FO-RD 
so that its support is precisely the locations constituting the linked list. 


In this paper, we define an extension of FO-RD, called Frame Logic (FL) 
where we allow a new operator Sp(a@) which, for an FO-RD formula a, evaluates 
to the support of a. Logical formulas thus have access to supports and can use 
it to separate supports and do frame reasoning. For instance, the logic can now 
express that two lists are disjoint by asserting that Sp(list(x)) A Sp(list(y)) = 0. 
It can then reason that in such a program heap configuration, if the program 
manipulates only the locations in Sp(list(y)), then list(x) would continue to be 
true, using simple frame reasoning. 


The addition of the support operator to FO-RD yields a very natural logic 
for expressing specifications. First, formulas in FO-RD have the same meaning 
when viewed as FL formulae. For example, f(x) = y (written in FO-RD as 
well as in FL) is true in any model that has x mapped by f to y, instead of a 
specialized “tight heaplet semantics” that demands that f be a partial function 
with the domain only consisting of the location x. The fact that the support of 
this formula contains only the location x is important, of course, but is made 
accessible using the support operator, i.e., Sp( f(x) = y) gives the set containing 
the sole element interpreted for x. Second, properties of supports can be naturally 
expressed using set operations. To state that the lists pointed to by x and y are 
disjoint, we don’t need special operators (such as the * operator in separation 
logic) but can express this as Sp(list(x)) O Sp(list(y)) = Ø. Third, when used to 
annotate programs, pre/post specifications for programs written in FL can be 
made implicitly local by interpreting their supports to be the localized heaplets 
accessed and modified by programs, yielding frame reasoning akin to program 
logics that use separation logic. Finally, as we show in this paper, the weakest 
precondition of specifications across basic loop-free paths can be expressed in 
FL, making it an expressive logic for reasoning with programs. Separation logic, 
on the other hand, introduces the magic wand operator —* (which is inherently 
higher-order) in order to add enough expressiveness to be closed under weakest 
preconditions [38]. 

We define frame logic (FL) as an extension of FO with recursive definitions 
(FO-RD) that operates over a multi-sorted universe, with a particular foreground 
sort (used to model locations on the heap on which pointers can mutate) and 
several background sorts that are defined using separate theories. Supports for 
formulas are defined with respect to the foreground sort only. A special back- 
ground sort of sets of elements of the foreground sort is assumed and is used 
to model the supports for formulas. For any formula ọ in the logic, we have a 
special construct Sp(y) that captures its support, a set of locations in the fore- 
ground sort, that intuitively corresponds to the precise subdomain of functions 
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the value of y depends on. We then prove a frame theorem (Theorem 1) that 
says that changing a model M by changing the interpretation of functions that 
are not in the support of y will not affect the truth of the formula y. This theo- 
rem then directly supports frame reasoning; if a model satisfies y and the model 
is changed so that the changes made are disjoint from the support of y, then 
y will continue to hold. We also show that FL formulae can be translated to 
vanilla FO-RD logic (without support operators); in other words, the semantics 
for the support of a formula can be captured in FO-RD itself. Consequently, we 
can use any FO-RD reasoning mechanism (proof systems [19, 20] or heuristic 
algorithms such as the natural proof techniques [24, 32, 37, 41]) to reason with 
FL formulas. 


We illustrate our logic using several examples drawn from program verifica- 
tion; we show how to express various data-structure definitions and the elements 
they contain and various measures for them using FL formulas (e.g., linked lists, 
sorted lists, list segments, binary search trees, AVL trees, lengths of lists, heights 
of trees, set of keys stored in the data-structure, etc.) 


While the sensibilities of our logic are definitely inspired by separation logic, 
there are some fundamental differences beyond the fact that our logic extends 
the syntax and semantics of classical logics with a special support operator 
and avoids operators such as x and —x. In separation logic, there can be many 
supports of a formula (also called heaplets)— a heaplet for a formula is one that 
supports its truth. For example, a formula of the form a V 8 can have a heaplet 
that supports the truth of œ or one that supports the truth of 8. However, 
the philosophy that we follow in our design is to have a single support that 
supports the truth value of a formula, whether it be true or false. Consequently, 
the support of the formula a V 2 is the union of the supports of the formulas a 
and £. 


The above design choice of the support being determined by the formula has 
several consequences that lead to a deviation from separation logic. For instance, 
the support of the negation of a formula y is the same as the support of y. And 
the support of the formula f(x) = y and its negation are the same, namely the 
singleton location interpreted for x. In separation logic, the corresponding for- 
mula will have the same heaplet but its negation will include all other heaplets. 
The choice of having determined supports or heaplets is not new, and there have 
been several variants and sublogics of separation logics that have been explored. 
For example, the logic DRYAD [32, 37] is a separation logic that insists on de- 
termined heaplets to support automated reasoning, and the precise fragment of 
separation logic studied in the literature [29] defines a sublogic that has (essen- 
tially) determined heaplets. The second main contribution in this paper is to 
show that this fragment of separation logic (with slight changes for technical 
reasons) can be translated to frame logic, such that the unique heaplet that 
satisfies a precise separation logic formula is its support of the corresponding 
formula in frame logic. 


The third main contribution of this paper is a program logic based on frame 
logic for a simple while-programming language destructively updating heaps. We 
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present two kinds of proof rules for reasoning with such programs annotated with 
pre- and post-conditions written in frame logic. The first set of rules are local 
rules that axiomatically define the semantics of the program, using the small- 
est supports for each command. We also give a frame rule that allows arguing 
preservation of properties whose supports are disjoint from the heaplet modified 
by a program. These rules are similar to analogous rules in separation logic. 
The second class of rules work to give a weakest tightest precondition for any 
postcondition with respect to non-recursive programs. In separation logic, the 
corresponding rules for weakest preconditions are often expressed using separat- 
ing implication (the magic-wand operator). Given a small change made to the 
heap and a postcondition 8, the formula a —* 8 captures all heaplets H where 
if a heaplet that satisfies œ is joined with H, then 8 holds. When a describes 
the change effected by the program, a —* ( captures, essentially, the weakest 
precondition. However, the magic wand is a very powerful operator that calls for 
quantifications over heaplets and submodels, and hence involves second order 
quantification. In our logic, we show that we can capture the weakest precon- 
dition with only first-order quantification, and hence first-order frame logic is 
closed under weakest preconditions across non-recursive programs blocks. This 
means that when inductive loop invariants are given also in FL, reasoning with 
programs reduces to reasoning with FL. By translating FL to pure FO-RD for- 
mulas, we can use FO-RD reasoning techniques to reason with FL, and hence 
programs. 


In summary, the contributions of this paper are: 


— A logic, called frame logic (FL) that extends FO-RD with a support operator 
and supports frame reasoning. We illustrate FL with specifications of various 
data-structures. We show a translation to equivalent formulas in FO-RD. 


— A program logic and proof system based on FL including local rules and rules 
for computing the weakest tightest precondition. FL reasoning required for 
proving programs is hence reducible to reasoning with FO-RD. 


— A separation logic fragment that can generate only precise formulas, and a 
translation from this logic to equivalent FL formulas. 


The paper is organized as follows. Section 2 sets up first-order logics with 
recursive definitions (FO-RD), with a special uninterpreted foreground sort of lo- 
cations and several background sorts/theories. Section 3 introduces Frame Logic 
(FL), its syntax, its semantics which includes a discussion of design choices for 
supports, proves the frame theorem for FL, shows a reduction of FL to FO-RD, 
and illustrates the logic by defining several data-structures and their properties 
using FL. Section 4 develops a program logic based on FL, illustrating them 
with proofs of verification of programs. Section 5 introduces a precise fragment 
of separation logic and shows its translation to FL. Section 6 discusses com- 
parisons of FL to separation logic, and some existing first-order techniques that 
can be used to reason with FL. Section 7 compares our work with the research 
literature and Section 8 has concluding remarks. 
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2 Background: First-Order Logic with Recursive 
Definitions and Uninterpreted Combinations of 
Theories 


The base logic upon which we build frame logic is a first order logic with recursive 
definitions (FO-RD), where we allow a foreground sort and several background 
sorts, each with their individual theories (like arithmetic, sets, arrays, etc.). The 
foreground sort and functions involving the foreground sort are uninterpreted 
(not constrained by theories). This hence can be seen as an uninterpreted com- 
bination of theories over disjoint domains. This logic has been defined and used 
to model heap verification before [23]. 

We will build frame logic over such a framework where supports are modeled 
as subsets of elements of the foreground sort. When modeling heaps in program 
verification using logic, the foreground sort will be used to model locations of the 
heap, uninterpreted functions from the foreground sort to foreground sort will 
be used to model pointers, and uninterpreted functions from the foreground sort 
to the background sort will model data fields. Consequently, supports will be 
subsets of locations of the heap, which is appropriate as these are the domains 
of pointers that change when a program updates a heap. 

We define a signature as X = (S; C; F; R; T), where S is a finite non-empty 
set of sorts. C is a set of constant symbols, where each c € C has some sort 
T € S. F is a set of function symbols, where each function f € F has a type of 
the form Tı X ... X Tm — T for some m, with 7;,7 E€ S. The sets R and Z are 
(disjoint) sets of relation symbols, where each relation R € RUT has a type of 
the form 7, X ... X Tm. The set Z contains those relation symbols for which the 
corresponding relations are inductively defined using formulas (details are given 
below), while those in R are given by the model. 

We assume that the set of sorts contains a designated “foreground sort” 
denoted by øf. All the other sorts in S are called background sorts, and for 
each such background sort ø we allow the constant symbols of type ø, function 
symbols that have type o” — o for some n, and relation symbols have type 0” 
for some m, to be constrained using an arbitrary theory T;,. 

A formula in first-order logic with recursive definitions (FO-RD) over such a 
signature is of the form (D,a), where D is a set of recursive definitions of the 
form R(T) := pr(Z), where R € Z and pr(Z) is a first-order logic formula, in 
which the relation symbols from Z occur only positively. a is also a first-order 
logic formula over the signature. We assume D has at most one definition for any 
inductively defined relation, and that the formulas pr and a use only inductive 
relations defined in D. 

The semantics of a formula is standard; the semantics of inductively defined 
relations are defined to be the least fixpoint that satisfies the relational equations, 
and the semantics of a is the standard one defined using these semantics for 
relations. We do not formally define the semantics, but we will formally define 
the semantics of frame logic (discussed in the next section and whose semantics 
is defined in the Technical Report [25]) which is an extension of FO-RD. 
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3 Frame Logic 


We now define Frame Logic (FL), the central contribution of this paper. 


FL formulas: y ::= tr = tr | R(tr,,..-,trm) | PAY | >y | tte(7: ¥, y) | dy: y. Y 

TES, RERUT of type m1 xX +--+ X Tm 
Guards: y = tr = tr | R(tr,,..-,tem) | YAY |y | itely : y, y) | Iy : y. y 
T E S \ {oy}, RER of type 11 X: X Tm 
Terms: tp = c | x | f(tr.,---,trm) | itely : tr, tr) | 

Spl) (IET = ose) | Sp(tr-) (GET = os) 
T,T' € S with constants c, variables x of type T, 
and functions f of type Tı X +: X tm >T 

Recursive definitions: R(T) := pr(T) with R € T of type Tı X +++ X Tm with 
Ti € S \ {osc}, FL formula pr(T) where all relation symbols 
R’ € T occur only positively or inside a support expression. 


Fig. 1. Syntax of frame logic: y for guards, t+ for terms of sort 7, and general formulas 
y. Guards cannot use inductively defined relations or support expressions. 


We consider a universe with a foreground sort and several background sorts, 
each restricted by individual theories, as described in Section 2. We consider the 
elements of the foreground sort to be locations and consider supports as sets 
of locations, i.e., sets of elements of the foreground sort. We hence introduce a 
background sort ogi); the elements of sort os) model sets of elements of sort of. 
Among the relation symbols in R there is the relation € of type of x asf) that 
is interpreted as the usual element relation. The signature includes the standard 
operations on sets U, N with the usual meaning, the unary function ~ that is 
interpreted as the complement on sets (with respect to the set of foreground 
elements), and the constant Ø. For these functions and relations we assume a 
background theory Bosa that is an axiomatization of the theory of sets. We 
further assume that the signature does not contain any other function or relation 
symbols involving the sort sf). 

For reasoning about changes of the structure over the locations, we assume 
that there is a subset Fm C F of function symbols that are declared mutable. 
These functions can be used to model mutable pointer fields in the heap that 
can be manipulated by a program and thus change. Formally, we require that 
each f € Fm has at least one argument of sort of. 

For variables, let Var, denote the set of variables of sort 7, where r € S. We 
let abbreviate tuples 71,...,2,, of variables. 

Our frame logic over uninterpreted combinations of theories is a variant of 
first-order logic with recursive definitions that has an additional operator Sp(y) 
that assigns to each formula y a set of elements (its support or “heaplet” in the 
context of heaps) in the foreground universe. So Sp(y) is a term of sort os(f). 
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The intended semantics of Sp(y) (and of the inductive relations) is defined 
formally as a least fixpoint of a set of equations. This semantics is presented 
in Section 3.3. In the following, we first define the syntax of the logic, then 
discuss informally the various design decisions for the semantics of supports, 
before proceeding to a formal definition of the semantics 


3.1 Syntax of Frame Logic (FL) 


The syntax of our logic is given in the grammar in Figure 1. This extends FO-RD 
with the rule for building support expressions, which are terms of sort osi¢) of 
the form Sp(a) for a formula a, or Sp(t) for a term t. 

The formulas defined by y are used as guards in existential quantification and 
in the if-then-else-operator, which is denoted by ite. The restriction compared to 
general formulas is that guards cannot use inductively defined relations (R ranges 
only over R in the rule for y, and over RUT in the rule for p), nor terms of sort 
as) and thus no support expressions (7 ranges over S \ {os¢)} in the rules for y 
and over S in the rule for y). The requirement that the guard does not use the 
inductive relations and support expressions is used later to ensure the existence 
of least fixpoints for defining semantics of inductive definitions. The semantics of 
an ite-formula ite(y, a, 3) is the same as the one of (yAa)V(=yA3); however, the 
supports of the two formulas will turn out to be different (i.e., Sp(ite(y : a, 8)) 
and Sp((y A a) V (=7 A 8)) are different), as explained in Section 3.2. The same 
is true for existential formulas, i.e., dy : y.y has the same semantics as dy.y A yp 
but, in general, has a different support. 

For recursive definitions (throughout the paper, we use the terms recursive 
definitions and inductive definitions with the same meaning), we require that 
the relation R that is defined does not have arguments of sort øs. This is 
another restriction in order to ensure the existence of a least fixpoint model in 
the definition of the semantics. ! 


3.2 Semantics of Support Expressions: Design Decisions 


We discuss the design decisions that go behind the semantics of the support 
operator Sp in our logic, and then give an example for the support of an inductive 
definition. The formal conditions that the supports should satisfy are stated in 
the equations in Figure 2, and are explained in Section 3.3. Here, we start by an 
informal discussion. 

The first decision is to have every formula uniquely define a support, which 
roughly captures the subdomain of mutable functions that a formula y’s truth- 
hood depends on, and have Sp(v) evaluate to it. 

The choice for supports of atomic formulae are relatively clear. An atomic 
formula of the kind f(x)=y, where x is of the foreground sort and f is a mutable 
function, has as its support the singleton set containing the location interpreted 


1 Tt would be sufficient to restrict formulas of the form R(ti,...,tn) for inductive 
relations R to not contain support expressions as subterms. 
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for x. And atomic formulas that do not involve mutable functions over the fore- 
ground have an empty support. Supports for terms can also be similarly defined. 
The support of a conjunction a A^ 8 should clearly be the union of the supports 
of the two formulas. 


Remark 1. In traditional separation logic, each pointer field is stored in a sep- 
arate location, using integer offsets. However, in our work, we view pointers as 
references and disallow pointer arithmetic. A more accurate heaplet for such 
references can be obtained by taking heaplet to be the pair (x, f) (see [30]), cap- 
turing the fact that the formula depends only on the field f of x. Such accurate 
heaplets can be captured in FL as well— we can introduce a non-mutable field 
lookup pointer Lr and use x.Ly.f in programs instead of x. f. 


What should the support of a formula a V 8 be? The choice we make here is 
that its support is the union of the supports of a and 8. Note that in a model 
where a is true and £ is false, we still include the heaplet of in Sp(aV 8). In a 
sense, this is an overapproximation of the support as far as frame reasoning goes, 
as surely preserving the model’s definitions on the support of œ will preserve the 
truth of a, and hence of a V 8. 

However, we prefer the support to be the union of the supports of a and 2. 
We think of the support as the subdomain of the universe that determines the 
meaning of the formula, whether it be true or false. Consequently, we would like 
the support of a formula and its negation to be the same. Given that the support 
of the negation of a disjunction, being a conjunction, is the union of the frames 
of a and 8, we would like this to be the support. 

Separation logic makes a different design decision. Logical formulas are not 
associated with tight supports, but rather, the semantics of the formula is defined 
for models with given supports/heaplets, where the idea of a heaplet is whether 
it supports the truthhood of a formula (and not its falsehood). For example, 
for a model, the various heaplets that satisfy (f(x) = y) in separation logic 
would include all heaplets where the location of x is not present, which does 
not coincide with the notion we have chosen for supports. However, for positive 
formulas, separation logic handles supports more accurately, as it can associate 
several supports for a formula, yielding two heaplets for formulas of the form 
a V B when they are both true in a model. The decision to have a single support 
for a formula compels us to take the union of the supports to be the support of 
a disjunction. 

There are situations, however, where there are disjunctions a V 3, where only 
one of the disjuncts can possibly be true, and hence we would like the support 
of the formula to be the support of the disjunct that happens to be true. We 
therefore introduce a new syntactical form ite(y : a, 3) in frame logic, whose 
heaplet is the union of the supports of y and a, if y is true, and the supports 
of y and £ if y is false. While the truthhood of ite(y : a, 8) is the same as that 
of (yA a) V (~y A 8), its supports are potentially smaller, allowing us to write 
formulas with tighter supports to support better frame reasoning. Note that the 
support of ite(y : a,@) and its negation ite(y : ~a, ~8) are the same, as we 
desired. 
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Turning to quantification, the support for a formula of the form dz.a is hard 
to define, as its truthhood could depend on the entire universe. We hence provide 
a mechanism for guarded quantification, in the form dz: y. a. The semantics 
of this formula is that there exists some location that satisfies the guard y, for 
which a holds. The support for such a formula includes the support of the guard, 
and the supports of a when z is interpreted to be a location that satisfies y. For 
example, Jx : (x = f(y)). g(x) = z has as its support the locations interpreted 
for y and f(y) only. 

For a formula R(t) with an inductive relation R defined by R(T) := pr(Z), 
the support descends into the definition, changing the variable assignment of the 
variables in % from the inductive definition to the terms in t. Furthermore, it 
contains the elements to which mutable functions are applied in the terms in t. 

Recursive definitions are designed such that the evaluation of the equations 
for the support expressions is independent of the interpretation of the inductive 
relations. The equations mainly depend on the syntactic structure of formulas 
and terms. Only the semantics of guards, and the semantics of subterms under 
a mutable function symbol play a role. For this reason, we disallow guards to 
contain recursively defined relations or support expressions. We also require that 
the only functions involving the sort os) are the standard functions involving 
sets. Thus, subterms of mutable functions cannot contain support expressions 
(which are of sort osf)) as subterms. 

These restrictions ensure that there indeed exists a unique simultaneous least 
solution of the equations for the inductive relations and the support expressions. 

We end this section with an example. 


Example 1. Consider the definition of a predicate tree(x) w.r.t. two unary mu- 
table functions left and right: 


tree(x) := ite(x = nil : true, a) where 
a= Aer: (L= left(x) Ar = right(x)).tree(£) A tree(r) A 
Sp(tree(£)) N Sp(tree(r)) = 0 A a(a € Sp(tree(€)) U Sp(tree(r))) 


This inductive definition defines binary trees with pointer fields left and right 
for left- and right-pointers, by stating that x points to a tree if either x is equal 
to nil (in this case its support is empty), or left(x) and right(x) are trees with 
disjoint supports. The last conjunct says that x does not belong to the support 
of the left and right subtrees; this condition is, strictly speaking, not required to 
define trees (under least fixpoint semantics). Note that the access to the support 
of formulas eases defining disjointness of heaplets, like in separation logic. The 
support of tree(x) turns out to be precisely the nodes that are reachable from 
x using left and right pointers, as one would desire. Consequently, if a pointer 
outside this support changes, we would be able to conclude using frame reasoning 
that the truth value of tree(x) does not change. 


3.3 Formal Semantics of Frame Logic 


Before we explain the semantics of the support expressions and inductive defini- 
tions, we introduce a semantics that treats support expressions and the symbols 
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[Sp(c)Jar(v) = [Sp(x)]m (v) = 0 for a constant c or variable x 


UK tare} Üt) iff E Fn 
[Sof (ti, ---,tn))]ar (r) = 4 $i ti of sort op i= 
U BSplt:)]u (e) if f Z Fm 


ISo(Sple))]m (v) = [Sp(y)] ar (v) 
[Sp(Sp(t) )] uv) ) 
[Sp(t1 = tz) w(v) = [Sp(ti)]m (w) U [Sp(t2)]m (v) 
[Sp(R(ti,.-.,tn)) Juv) = Ui lSplti)]m (v) for RER 
[Sp(R(E))I ar (v) = [Sp(pr(@) Ia le & flm.) U USP) (v) 
for R € Z with definition R(X) := pr(Z), 
— isean ta) BS (Cipa 0p) 
[Sp(a A 8)]u(v) = [Sp(a)]ac(v) U [Sp(8)] ar (v) 
[Sp(-y)] a (v) = [Spe] a (v) 
M(v) 


} :Q v) = v [Sp(a)] az (v) if M, VEY 
[Sp(ite(y : a, Blm) = [Sp(y)]ar(v) U ei oes 
: = [Sp(ti)Jar(v) if M,v =| y 
[Sp(ite(y : t1, t2)]m (v) = [Sp(q) Jar (v) U Isolta) if Mv + 
[Spy : y-p)]lm (v) = U [Sp(y)] ac (rly + u]) U e aY me LSplp)]m (rly + u]) 


Fig. 2. Equations for support expressions 


from Z as uninterpreted symbols. We refer to this semantics as uninterpreted se- 
mantics. For the formal definition we need to introduce some terminology first. 

An occurrence of a variable x in a formula is free if it does not occur under 
the scope of a quantifier for x. By renaming variables we can assume that each 
variable only occurs freely in a formula or is quantified by exactly one quantifier 
in the formula. We write y(x1,..., £) to indicate that the free variables of y are 
among z1,..., k. Substitution of a term t for all free occurrences of variable x in 
a formula ọ is denoted y[t/]. Multiple variables are substituted simultaneously 
as y[t1/@1,...,tn/@n]. We abbreviate this by [t/z]. 

A model is of the form M = (U; [-]m) where U = (U,)cegs contains a universe 
for each sort, and an interpretation function [-],7. The universe for the sort os(f) 
is the powerset of the universe for of. 

A variable assignment is a function v that assigns to each variable a concrete 
element from the universe for the sort of the variable. For a variable x, we write 
Dz for the universe of the sort of x (the domain of a). For a variable x and an 
element u € Dy we write v[x + uj] for the variable assignment that is obtained 
from v by changing the value assigned for x to u. 

The interpretation function [-],; maps each constant c of sort ø to an el- 
ement |c] E€ Us, each function symbol f : T1 Xx... X Tm — T to a concrete 
function [f]m : Un x ... x U;,, + U+, and each relation symbol R € RUT of 
type 71 X... X Tm to a concrete relation [R]m C Un x... U;,,. These interpre- 
tations are assumed to satisfy the background theories (see Section 2). Further- 
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more, the interpretation function maps each expression of the form Sp(y) to a 
function [Sp(y)]az that assigns to each variable assignment v a set [Sp(y)]ar(v) 
of foreground elements. The set [Sp(y)]az(v) corresponds to the support of the 
formula when the free variables are interpreted by v. Similarly, [Sp(t)]az is a 
function from variable assignments to sets of foreground elements. 

Based on such models, we can define the semantics of terms and formulas in 
the standard way. The only construct that is non-standard in our logic are terms 
of the form Sp(y), for which the semantics is directly given by the interpretation 
function. We write [t] m, for the interpretation of a term t in M with variable 
assignment v. With this convention, [Sp(y)]ar(v) denotes the same thing as 
[Sp(y)].v. As usual, we write M,v — ¢ to indicate that the formula y is true 
in M with the free variables interpreted by v, and [vy], denotes the relation 
defined by the formula y with free variables 7. 

We refer to the above semantics as the uninterpreted semantics of p because 
we do not give a specific meaning to inductive definitions and support expres- 
sions. 

Now let us define the true semantics for FL. The relation symbols R € T 
represent inductively defined relations, which are defined by equations of the 
form R(T) := pr(®) (see Figure 1). In the intended meaning, R is interpreted as 
the least relation that satisfies the equation 


Rlm = ler]. 


The usual requirement for the existence of a unique least fixpoint of the equation 
is that the definition of R does not negatively depend on R. For this reason, we 
require that in pr(Z) each occurrence of an inductive predicate R’ € T is either 
inside a support expression, or it occurs under an even number of negations.? 

Every support expression is evaluated on a model to a set of foreground el- 
ements (under a given variable assignment v). Formally, we are interested in 
models in which the support expressions are interpreted to be the sets that cor- 
respond to the smallest solution of the equations given in Figure 2. The intuition 
behind these definitions was explained in Section 3.2 


Example 2. Consider the inductive definition tree(x) defined in Example 1. To 
check whether the equations from Figure 2 indeed yield the desired support, 
note that the supports of Sp(a = nil) = Sp(x) = Sp(true) = 9. Below, we write 
[u] for a variable assignment that assigns u to the free variable of the formula 
that we are considering. Then we obtain that Sp(tree(a))[u] = 0 if u = nil, and 
Sp(tree(x))[u] = Sp(a)[u] if x # nil. The formula a is existentially quantified 
with guard ¢ = left(x) Ar = right(x). The support of this guard is {u} because 
mutable functions are applied to x. The support of the remaining part of a is the 
union of the supports of tree(¢)[left(u)] and tree(r)[right(u)] (the assignments for 
£ and r that make the guard true). So we obtain for the case that u Æ nil that 
the element u enters the support, and the recursion further descends into the 
subtrees of u, as desired. 


? As usual, it would be sufficient to forbid negative occurrences of inductive predicates 
in mutual recursion. 
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A frame model is a model in which the interpretation of the inductive re- 
lations and of the support expressions corresponds to the least solution of the 
respective equations (see the Technical Report [25] for a rigorous formalisation). 


Proposition 1. For each model M, there is a unique frame model over the 
same universe and the same interpretation of the constants, functions, and non- 
inductive relations. 


3.4 A Frame Theorem 


The support of a formula can be used for frame reasoning in the following sense: 
if we modify a model M by changing the interpretation of the mutable functions 
(e.g., a program modifying pointers), then truth values of formulas do not change 
if the change happens outside the support of the formula. This is formalized 
below and proven in the Technical Report [25]. 

Given two models M, M’ over the same universe, we say that M’ is a mutation 
of M if lR] = (Rl, (cla = lc]lw, and (flac = (fla, for all constants C, 
relations R € R, and functions f € F \ Fm. In other words, M can only be 
different from M’ on the interpretations of the mutable functions, the inductive 
relations, and the support expressions. 

Given a subset X C U,, of the elements from the foreground universe, we say 
that the mutation is stable on X if the values of the mutable functions did not 
change on arguments from X, that is, [f]m (u1, ---, un) = [far (ui,.--, Un) for 
all mutable functions f € Fm and all appropriate tuples u,,..., un of arguments 
with {uy,...,un} AX AO. 


Theorem 1 (Frame Theorem). Let M, M’ be frame models such that M' is 
a mutation of M that is stable on X C Us, and let v be a variable assignment. 
Then M,v — a iff M',v = a for all formulas a with [Sp(a)]u(v) C X, and 
ltlm. = [t], for all terms t with [Sp(t)]Jac(v) C X. 


3.5 Reduction from Frame Logic to FO-RD 


The only extension of frame logic compared to FO-RD is the operator Sp, which 
defines a function from interpretations of free variables to sets of foreground 
elements. The semantics of this operator can be captured within FO-RD itself, 
so reasoning within frame logic can be reduced to reasoning within FO-RD. 

A formula a(y) with Y = y1,.-.,Ym has one support for each interpreta- 
tion of the free variables. We capture these supports by an inductively defined 
relation Sp,(Y, z) of arity m + 1 such that for each frame model M, we have 
(u1,..., Um, U) E [Spa]m if u € [Sp(a)]ac(v) for the interpretation v that inter- 
prets yi as Ui. 

Since the semantics of Sp(a) is defined over the structure of a, we introduce 
corresponding inductively defined relations Spg and Sp, for all subformulas 6 
and subterms t of either a or of a formula pp for R€ Z. 
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list(x) := ite(x = nil, true, dz: z = nezt(x). list(z) Ax € Sp(list(z)) 
(linked list) 
dll(x) := ite(x = nil: T, ite(neat(x) = nil: T, dz: z = nezt(x). 
prev(z) =a A dll(z) Ax ¢ Sp(dll(z)))) (doubly linked list) 
lseg(x, y) := ite(x = y : T,3z : z = nezt(x). lseg(z, y) Ax Z Sp(lseg(z, y))) 
(linked list segment) 
length(x, n) := ite(x = nil : n = 0, 3z : z = nezt(x). length(z,n — 1)) 
(length of list) 
slist(x) := ite(x = nil: T, ite(next(x) = nil, T, dz: z = nezt(x). 
key(x) < key(z) A slist(z) A x ¢ Sp(slist(z)))) (sorted list) 
mkeys(x, M) := ite(x = nil: M = b, 3z, Mı : z = nezt(x). 
M = Mı Um {key(x)} A mkeys(z, M1)) A x ¢ Sp(mkeys(z, M1)) 
(multiset of keys in linked list) 
btree(x) := ite(x = nil: T, L, r : L = left(x) Ar = right(zx). 
btree(£l) A btree(r) A x Z Sp(btree(£)) Ax € Sp(btree(r)) A 
Sp(btree(£)) N Sp(btree(r)) = 0) (binary tree) 
bst(x) := ite(x = nil: T, ite(left(x) = nil A right(x) = nil : T, ite(left(x) = nil : 
dr: r = right(a). key(x) < key(r) A bst(r) A x ¢ Sp(bst(r)), 
ite(right(x) = nil: 3l : £ = left(x). key(€) < key(x) A bst(£) Ax Z Sp(bst(£)), 
Al,r: € = left(x) Ar = right(«). key(x) < key(r) A key(£) < key(x) A 
bst(£) A bst(r) Ax Z Sp(bst(£)) Ax Z Sp(bst(r)) A 
Sp(bst(l)) N Sp(bst(r)) = @)))) (binary search tree) 
height(x,n) := ite(x = nil : n = 0, 34, r, nı, n2 : L= left(x) Ar = right(a). 
height(l, n1) A height(r, n2) A ite(nı > n2 : n = nı +1,n = n2 + 1)) 
(height of binary tree) 
bfac(x,b) := ite(x = nil : 0,3l, r, nı, no : L= left(x) Ar = right(x). 
height(£,n1) A height(r, n2) A b = nz — nı) 
(balance factor (for AVL tree)) 
avl(x) := ite(x = nil: T, L, r : £= left(x) Ar = right(x). 
avl (£) A avl(r) A bfac(x) € {—1,0,1} A 
x Z Sp(avl(£)) U Sp(avl(r)) A Sp(avl(2)) A Sp(avl(r)) = Ø) (avl tree) 
ttree(x) := pttree(x, nil) (threaded tree) 
pttree(x,p) := ite(x = nil: T,3L,r : £= left(x) Ar = right(x). 
((r = nil A tnext(x) = p) V (r # nil A tnext(a) = r)) A 
pttree(£, x) A pttree(r, p) Ax Z Sp(pttree(l, £)) U Sp(pttree(r, p)) A 


Sp(pttree(é,x)) N Sp(pttree(r, p)) = 0) 
(threaded tree auxiliary definition) 


ws A 


Fig. 3. Example definitions of data-structures and other predicates in Frame Logic 
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The equations for supports from Figure 2 can be expressed by inductive def- 
initions for the relations Spg. The translations are shown in the Technical Re- 
port [25]. It is not hard to see that general frame logic formulas can be translated 
to FO-RD formulas that make use of these new inductively defined relations. 


Proposition 2. For every frame logic formula there is an equisatisfiable FO- 
RD formula with the signature extended by auxiliary predicates for recursive 
definitions of supports. 


3.6 Expressing Data-Structures Properties in FL 


We now present the formulation of several data-structures and properties about 
them in FL. Figure 3 depicts formulations of singly- and doubly-linked lists, 
list segments, lengths of lists, sorted lists, the multiset of keys stored in a list 
(assuming a background sort of multisets), binary trees, their heights, and AVL 
trees. In all these definitions, the support operator plays a crucial role. We also 
present a formulation of single threaded binary trees (adapted from [7]), which are 
binary trees where, apart from tree-edges, there is a pointer tnext that connects 
every tree node to the inorder successor in the tree; these pointers go from leaves 
to ancestors arbitrarily far away in the tree, making it a nontrivial definition. 

We believe that FL formulas naturally and succinctly express these data- 
structures and their properties, making it an attractive logic for annotating 
programs. 


4 Programs and Proofs 


In this section, we develop a program logic for a while-programming language 
that can destructively update heaps. We assume that location variables are de- 
noted by variables of the form x and y, whereas variables that denote other 
data (which would correspond to the background sorts in our logic) are denoted 
by v. We omit the grammar to construct background terms and formulas, and 
simply denote such ‘background expressions’ with be and clarify the sort when 
it is needed. Finally, we assume that our programs are written in Single Static 
Assignment (SSA) form, which means that every variable is assigned to at most 
once in the program text. The grammar for our programming language is in 
Figure 4. 


Sis i= e| t= =y V v= be | mf =y 
| alloc(x) | free(x) | if be then S else S | while be do S| S; S 


Fig. 4. Grammar of while programs. c is a constant location, f is a field pointer, and 
be is a background expression. In our logic, we model every field f as a function f() 
from locations to the appropriate sort. 
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4.1 Operational Semantics 


A configuration C is of the form (M,H,U) where M contains interpretations 
for the store and the heap. The store is a partial map that interprets variables, 
constants, and non-mutable functions (a function from location variables to lo- 
cations) and the heap is a total map on the domain of locations that interprets 
mutable functions (a function from pointers and locations to locations). H is a 
subset of locations denoting the set of allocated locations, and U is a subset of 
locations denoting a subset of unallocated locations that can be allocated in the 
future. We introduce a special configuration L that the program transitions to 
when it dereferences a variable not in H. 


A configuration (M, H,U) is valid if all variables of the location sort map 
only to locations not in U, locations in H do not point to any location in U, 
and U is a subset of the complement of H that does not contain nil or the 
locations mapped to by the variables. We denote this by valid(M, H,U). Initial 
configurations and reachable configurations of any program will be valid. 


The transition of configurations on various commands that manipulate the 
store and heap are defined in the natural way. Allocation adds a new location 
from U into H with pointer-fields defaulting to nil and default data fields. See 
the Technical Report [25] for more details. 


4.2 Triples and Validity 


We express specifications of programs using triples of the form {a}S{G} where 
a and 6 are FL formulae and S is a program. The formulae are, however, 
restricted— for simplicity, we disallow atomic relations on locations, and func- 
tions with arity greater than one. We also disallow functions from a background 
sort to the foreground sort (see Section 3). Lastly, quantified formulae can have 
supports as large as the entire heap. However, our program logic covers a more 
practical fragment without compromising expressivity. Thus, we require guards 
in quantification to be of the form f(z’) = z or z € U (z is the quantified 
variable). 


We define a triple to be valid if every valid configuration with heaplet being 
precisely the support of a, when acted on by the program, yields a configuration 
with heaplet being the support of 8. More formally, a triple is valid if for every 
valid configuration (M, H,U) such that M = a, H = [Sp(a)] mu: 


— it is never the case that the abort state L is encountered in the execution 
on S. 

— if (M,H,U) transitions to (M’',H’,U’) on S, then M’ = 6 and H’ = 
[Sp() Ix 
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4.3 Program Logic 


First, we define a set of local rules and rules for conditionals, while, sequence, 
consequence, and framing: 


Assignment: {true} x := y {x = y} {true} x := c{x =c} 
Lookup: {f(y) = f(y)} x = y.f {x = f(y)} 
Mutation: {f(x) = f(x)} xf = y {f(2) =y} 
Allocation: {true} alloc(x) { \ f(x) = def ;} 
fer 
Deallocation: { f(x) = f(«)} free(x) {true} 
{beAa}S{B} {AbeAa} T {8} 
Conditional: {a} if be then S else T {3} 
{aA be} S {a} 
While: {a} while be do S {be A a} 
{a} S {8} {8} T {n} 
Sequence: {a} S; T {py} 
a =a Sp(a) = Sp(a’) 


{a} S {8} 


b = p Sp(8) = Sp(8") 
Consequence: {a} S {p"} 
Sp(a) N Sp(u) =9 {a} S {5} 2 
Frame: {anu} S{BA p} EO =e 


The above rules are intuitively clear and are similar to the local rules in 
separation logic [38]. The rules for statements capture their semantics using 
minimal/tight heaplets, and the frame rule allows proving triples with larger 
heaplets. In the rule for alloc, the postcondition says that the newly allocated 
location has default values for all pointer fields and datafields (denoted as defy). 
The soundness of the frame rule relies crucially on the frame theorem for FL 
(Theorem 1). The full soundness proof can be found in the Technical Report [25]. 


Theorem 2. The above rules are sound with respect to the operational seman- 
tics. 


4.4 Weakest-Precondition Proof Rules 


We now turn to the much more complex problem of designing rules that give 
weakest preconditions for arbitrary postconditions, for loop-free programs. In 
separation logic, such rules resort to using the magic-wand operator —* [12, 27, 
28, 38], The magic-wand operator, a complex operator whose semantics calls for 
second-order quantification over arbitrarily large submodels. In our setting, our 
main goal is to show that FL is itself capable of expressing weakest preconditions 
of postconditions written in FL. 
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First, we define a notion of Weakest Tightest Precondition (WTP) of a for- 
mula 8 with respect to each command in our operational semantics. To define 
this notion, we first define a preconfiguration, and use that definition to define 
weakest tightest preconditions: 


Definition 1. The preconfigurations corresponding to a valid configuration (M, H,U) 
with respect to a program S are a set of valid configurations of the form (Mp, Hp, Up) 
(with M, being a model, H, and Up a subuniverse of the locations in Mp, and Up 
being unallocated locations) such that when S is executed on M, with unallocated 
set Up it dereferences only locations in H, and results (using the operational se- 
mantics rules) in (M,H,U) or gets stuck (no transition is available). That is: 


preconfigurations((M, H,U), S) = 
{(Mp, Hp, Up) | valid(Mp, Hp, Up) and (Mp, Hp, Up) $ (M, H,U) or 
(Mp, Hp, Up) gets stuck on S} 
Definition 2. a is a WTP of a formula 6 with respect to a program S if 
{(Mp, Hp, Up) | Mp E a, Hp = [Sp(a)] m,, valid(Mp, Hp, Up)} 
= {preconfigurations((M, H,U), S) | M = 8, H = [Sp(8)] m, valid(M, H,U)} 


With the notion of weakest tightest preconditions, we define global program 
logic rules for each command of our language. In contrast to local rules, global 
specifications contain heaplets that may be larger than the smallest heap on 
which one can execute the command. 

Intuitively, a WTP of £ for lookup states that 2 must hold in the precondition 
when « is interpreted as x’, where x’ = f(y), and further that the location y 
must belong to the support of 8. The rules for mutation and allocation are 
more complex. For mutation, we define a transformation MW*!'=¥(B) that 
evaluates a formula 8 in the pre-state as though it were evaluated in the post- 
state. We similarly define such a transformation MW?2!"°) for allocation. We 
will define these in detail later. Finally, the deallocation rule ensures x is not in 
the support of the postcondition. The conjunct f(x) = f(x) is provided to satisfy 
the tightness condition, ensuring the support of the precondition is the support 
of the postcondition with « added. The rules can be seen below, and the proof 
of soundness for these global rules can be found in the Technical Report [25]. 


Assignment-G: {3ly/z]} «= y{8} — {Ble/z]} a = c {8} 
Lookup-G: {32': 2’ = f(y). (8 ^y € Sp(8))[a"/z]} æ = y.f {8} 
(where x’ does not occur in 8) 
Mutation-G: {MW*1=¥(8 A 2 € Sp(B))} xf := y {8} 
Allocation-G: {Vv : (v € U) .(v £ nil = MW?!) (B))} alloc(x) {8} 
(for some fresh variable v) 


Deallocation-G: {8 Ax ¢ Sp(8) A f(x) = f(x)} free(x) {3} 
(where f € Fm is an arbitrary (unary) mutable function) 
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4.5 Definitions of MW Primitives 


Recall that the MW? primitives MW*S°=¥ and MW?" need to evaluate a 
formula £ in the pre-state as it would evaluate in the post-state after mutation 
and allocation statements. The definition of MW*/*=¥ is as follows: 


MW*4"(g) = B[Az. ite(z = z : ite(f (x) = f(x): yy), F/F 


The [Az.e(z)/f] notation is shorthand for saying that each occurrence of a 
term of the form f(t), where ¢ is a term, is substituted (recursively, from in- 
side out) by the term p(t). The precondition essentially evaluates 8 taking into 
account f’s transformation, but we use the ite expression with a tautological 
guard f(x) = f(x) (which has the support containing the singleton x) in order 
to preserve the support. The definition of M yale) is similar. Refer to the 
Technical Report [25] for details. 


Theorem 3. The rules above suffixed with -G are sound w.r.t the operational 
semantics. And, each precondition corresponds to the weakest tightest precondi- 


tion of p. 


4.6 Example 


In this section, we will see an example of using our program logic rules that we 
described earlier. This will demonstrate the utility of Frame Logic as a logic for 
annotating and reasoning with heap manipulating programs, as well as offer some 
intuition about how our program logic can be deployed in a practical setting. 
The following program performs in-place list reversal: j := nil ; while (i 
!= nil) do k := i.next ; i.next := j ; j := i ; i := k For the sake 
of simplicity, instead of proving that this program reverses a list, we will instead 
prove the simpler claim that after executing this program 7 is a list. The recursive 
definition of list we use for this proof is the one from Figure 3: 


list(x) := ite(a = nil, true, dz: z = nezt(x). list(z) Ax € Sp(list(z))) 


We need to also give an invariant for the while loop, simply stating that i 
and j point to disjoint lists: list(i) A list(j) A Sp(list(i)) O Sp(list(7)) = 0. 

We prove that this is indeed an invariant of the while loop below. Our proof 
uses a mix of both local and global rules from Sections 4.3 and 4.4 above to 
demonstrate how either type of rule can be used. We also use the consequence 
rule along with the program rule to be applied in several places in order to 
simplify presentation. As a result, some detailed analysis is omitted, such as 
proving supports are disjoint in order to use the frame rule. 


{list(i) A list(j) A Sp(list(2)) O Sp(list(j)) =OAiA nil} (consequence rule) 


3 The acronym MW is a shout-out to the Magic-Wand operator, as these serve a 
similar function, except that they are definable in FL itself. 
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{list(i) A list(j) A Sp(list(i)) O Sp(list(j)) =0 Ai A nil Ai € Sp(list(z))} 
(consequence rule: unfolding list definition) 
{IK : k' = neat(i). list(k') Ai Z Sp(list(k’)) A list(j) 
Ai g Sp(list(j)) A Sp(list(k’)) O Sp(list(j)) = Ø} (consequence rule) 
{Sk' : k = next(i). next(i) = next(i) A list(k’) Ai Z Sp(list(k’)) A list(j) 
Ai g Sp(list(j)) A Sp(list(k’)) O Sp(list(7)) = Ø} 
k := i-next ; (consequence rule, lookup-G rule) 
{next(i) = nezt(t) A list(k) A i Z Sp(list(k)) A list(j) 
Ai g Sp(list(j)) A Sp(list(k)) O Sp(list(7)) = 0} 
i.next := j ; (mutation rule, frame rule) 
{next(i) = j A list(k) Ai Z Sp(list(k)) A list(j) 
Ni Z Sp(list(7)) A Sp(list(k)) O Sp(list(j)) = 0} (consequence rule) 
{list(k) A next(i) = j Ai € Sp(list(j)) A list(j) A Sp(list(k)) O Sp(list(j)) = Ø} 
(consequence rule: folding list definition) 
{list(k) A list(i) A Sp(list(k)) O Sp(list(i)) = Ø} 
jr=i;ic:=k assignment-G rule) 
{list(i) A list(j) A Sp(list(i)) O Sp(list(7)) = Ø} 


Armed with this, proving j is a list after executing the full program above is 
a trivial application of the assignment, while, and consequence rules, which we 
omit for brevity. 

Observe that in the above proof we were apply the frame rule because of 
the fact that i belongs neither to Sp(list(k)) nor Sp(list(j)). This can be dis- 
pensed with easily using reasoning about first-order formulae with least-fixpoint 
definitions, techniques for which are discussed in Section 6. 

Also note the invariant of the loop is precisely the intended meaning of list (i)* 
list(j) in separation logic. In fact, as we will see in Section 6, we can define a 
first-order macro Star as Star(y,w) = pAWA Spl) N Spy) = Ø. We can use 
this macro to represent disjoint supports in similar proofs. 

These proofs demonstrate what proofs of actual programs look like in our 
program logic. They also show that frame logic and our program logic can prove 
many results similarly to traditional separation logic. And, by using the derived 
operator Star, very little even in terms of verbosity is sacrificed in gaining the 
flexibility of Frame Logic(please see Section 6 for a broader discussion of the ways 
in which Frame Logic differs from Separation Logic and in certain situations 
offers many advantages in stating and reasoning with specifications/invariants). 


5 Expressing a Precise Separation Logic 


In this section, we show that FL is expressive by capturing a fragment of sep- 
aration logic in frame logic; the fragment is a syntactic fragment of separation 
logic that defines only precise formulas— formulas that can be satisfied in at 
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most one heaplet for any store. The translation also shows that frame logic can 
naturally and compactly capture such separation logic formulas. 


5.1 A Precise Separation Logic 


As discussed in Section 1, a crucial difference between separation logic and 
frame logic is that formulas in separation logic have uniquely determined sup- 
ports/heaplets, while this is not true in separation logic. However, it is well 
known that in verification, determined heaplets are very natural (most uses of 
separation logic in fact are precise) and sometimes desirable. For instance, see [8] 
where precision is used crucially to give sound semantics to concurrent separa- 
tion logic and [29] where precise formulas are proposed in verifying modular 
programs as imprecision causes ambiguity in function contracts. 

We define a fragment of separation logic that defines precise formulas (more 
accurately, we handle a slightly larger class inductively: formulas that when 
satisfiable have unique minimal heaplets for any given store). The fragment we 
capture is similar to the notion of precise predicates seen in [29]: 


Definition 3. PSL Fragment: 


— sf: formulas over the stack only (nothing dereferenced). Includes isatom?(), 
m(x) = y for immutable m, true, background formulas, etc. 

BeA 

— ite(sf, p1, p2) where sf is from the first bullet 

— pı A ve and pı * p2 

— T whereT contains all unary inductive definitions I that have unique heaplets 
inductively (list, tree, etc.). In particular, the body pr of I is a formula in 
the PSL fragment (pr|I 4 y] is in the PSL fragment provided p is in the 
PSL fragment). Additionally, for all x, if s,h I(x) and s,h’ — I(x), then 
as (is 

— Jy. (x Í y) «1 


Note that in the fragment negation and disjunction are disallowed, but mu- 
tually exclusive disjunction using ite is allowed. Existential quantification is only 
present when the topmost operator is a * and where one of the formulas guards 
the quantified variable uniquely. 

The semantics of this fragment follows the standard semantics of separation 


logic [12, 27, 28, 38], with the heaplet of x a y taken to be {a}. See Remark 1 


in Section 3.2 for a discussion of a more accurate heaplet for x 4; y being the set 
containing the pair (x, f), and how this can be modeled in the above semantics 
by using field-lookups using non-mutable pointers. 


Theorem 4 (Minimum Heap). For any formula y in the PSL fragment, if 
there is an s and h such that s,h = ọ then there is a hy such that s, hp = p 
and for all h' such that s,h' = y, hy Ch’. 


4 While we only assume unary inductive definitions here, we can easily generalize this 
to inductive definitions with multiple parameters. 
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5.2 Translation to Frame Logic 


For a separation logic store and heap s,h (respectively), we define the corre- 
sponding interpretation Ms,» such that variables are interpreted according to s 
and values of pointer functions on dom(h) are interpreted according to h. For y 
in the PSL fragment, we first define a formula P(y), inductively, that captures 
whether ọ is precise. y is a precise formula iff, when it is satisfiable with a store 
s, there is exactly one h such that s,h = wy. The formula P(y) is in separation 
logic and will be used in the translation. To see why this formula is needed, 
consider the formula y1 A ite(sf, p2, 93). Assume that pı is imprecise, y is pre- 
cise, and y3 is imprecise. Under conditions where sf is true, the heaplets for yı 
and y2 must align. However, when sf is false, the heaplets for yı and y3 can 
be anything. Because we cannot initially know when sf will be true or false, we 
need this separation logic formula P(y) that is true exactly when y is precise. 


Definition 4. Precision predicate P: 


— P(sf) = L and P(a Ê y) =T 
— P(ite(sf, p1, p2)) = (sf A P(p1)) V Gsf A P(p2)) 
— P(p1 A p2) = P(r) V P(p2) 
— P(g1 * p2) = P(p1) A P(p2) 

P(I)=T where I € T is an inductive predicate 


= f 
dy. (x > y) * p1) = P(y1) 

Note that this definition captures precision within our fragment since stack 
formulae are imprecise and pointer formulae are precise. The argument for the 
rest of the cases follow by simple structural induction. 

Now we define the translation T inductively: 


=P 


— 


Definition 5. Translation from PSL to Frame Logic: 


- T(sf) = sf and T(x > y) = (f(x) = y) 
— tte(sf, p1, 92) = ite(T (sf), T (21), T(¢2)) 
= Tigi A 92) = T(¢i) AT (2) AT(P(G1)) => Sp(T(¢2))  Sp(T(¢1)) 
A T(P(p2)) => Sp(T(¥1)) € Sp(T(¢2)) 
— T(g1 * p2) = T(~1) A T(p2) A Sp(T(¢1)) N Sp(T(2)) = 9 
— T(1) = T(pr) where pr is the definition of the inductive predicate I as in 
Section 3. 


Jy. (« É y) * p1) = 3y: [F(@) = y). [T(yi) Ax Z Sp(T(y1))] 


Finally, recall that any formula vy in the PSL fragment has a unique minimal 
heap (Theorem 4). With this (and a few auxiliarly lemmas that can be found in 
the Technical Report [25]), we have the following theorem, which captures the 
correctness of the translation: 


| 
= 
| 


Theorem 5. For any formula y in the PSL fragment, we have the following 
implications: she Msn =T) 

Ms n =E Tho) s, h' = ọ where h! = M,.n(Sp(T(y))) 
Here, Ms.n(Sp(T(y))) is the interpretation of Sp(T(y)) in the model Msn. Note 
h' is minimal and is equal to hp as in Theorem 4. 
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6 Discussion 


Comparison with Separation Logic. The design of frame logic is, in many ways, 
inspired by the design choices of separation logic. Separation logic formulas im- 
plicitly hold on tight heaplets— models are defined on pairs (s,h), where s is 
a store (an interpretation of variables) and h is a heaplet that defines a subset 
of the heap as the domain for functions/pointers. In Frame Logic, we choose to 
not define satisfiability with respect to heaplets but define it with respect to the 
entire heap. However, we give access to the implicitly defined heaplet using the 
operator Sp, and give a logic over sets to talk about supports. The separating 
conjunction operation * can then be expressed using normal conjunction and a 
constraint that says that the support of formulae are disjoint. 

We do not allow formulas to have multiple supports, which is crucial as Sp is 
a function, and this roughly corresponds to precise fragments of separation logic. 
Precise fragments of separation logic have already been proposed and accepted in 
the separation logic literature for giving robust handling of modular functions, 
concurrency, etc. [8, 29]. Section 5 details a translation of a precise fragment 
of separation logic (with * but not magic wand) to frame logic that shows the 
natural connection between precise formulas in separation logic and frame logic. 

Frame logic, through the support operator, facilitates local reasoning much 
in the same way as separation logic does, and the frame rule in frame logic 
supports frame reasoning in a similar way as separation logic. The key difference 
between frame logic and separation logic is the adherence to a first-order logic 
(with recursive definitions), both in terms of syntax and expressiveness. 

First and foremost, in separation logic, the magic wand is needed to express 
the weakest precondition [38]. Consider for example computing the weakest pre- 
condition of the formula list(x) with respect to the code y.n := z. The weakest 
precondition should essentially describe the (tight) heaplets such that changing 
the n pointer from y to z results in x pointing to a list. In separation logic, 
this is expressed typically (see [38]) using magic wand as (y = z) — (list(«)). 
However, the magic wand operator is inherently a second-order property. The 
formula a —* holds on a heaplet h if for any disjoint heaplet that satisfies a, 
B will hold on the conjoined heaplet. Expressing this property (for arbitrary a, 
whose heaplet can be unbounded) requires quantifying over unbounded heaplets 
satisfying a, which is not first order expressible. 

In frame logic, we instead rewrite the recursive definition list(-) to a new 
one list’(-) that captures whether x points to a list, assuming that n(y) = z 
(see Section 4.4). This property continues to be expressible in frame logic and 
can be converted to first-order logic with recursive definitions (see Section 3.5). 
Note that we are exploiting the fact that there is only a bounded amount of 
change to the heap in straight-line programs in order to express this in FL. 

Let us turn to expressiveness and compactness. In separation logic, separa- 
tion of structures is expressed using *, and in frame logic, such a separation 
is expressed using conjunction and an additional constraint that says that the 
supports of the two formulas are disjoint. A precise separation logic formula 
of the form a, * Q2 *...@, is compact and would get translated to a much 
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larger formula in frame logic as it would have to state that the supports of 
each pair of formulas is disjoint. We believe this can be tamed using macros 
(Star(a, 8) =aA BA Sp(a)N Sp(B) = 0). 

There are, however, several situations where frame logic leads to more com- 
pact and natural formulations. For instance, consider expressing the property 
that x and y point to lists, which may or may not overlap. In Frame Logic, 
we simply write list(a) A list(y). The support of this formula is the union of 
the supports of the two lists. In separation logic, we cannot use * to write 
this compactly (while capturing the tightest heaplet). Note that the formula 
(list(x) x true) A (list(y) * true) is not equivalent, as it is true in heaplets that 
are larger than the set of locations of the two lists. The simplest formulation we 
know is to write a recursive definition lseg(u, v) for list segments from u to v and 
use quantification: (3z. lseg(x, z)» lseg(y, z) * ist(z)) V (list(x) x» list(y)) where 
the definition of Iseg is the following: Iseg(u,v) = (u = v A emp) V (Aw. u > 
w x lseg(w,v)). 

If we wanted to say z1,..., n all point to lists, that may or may not overlap, 
then in FL we can say list(x1) A list(x2) A... A list(a,). However, in separation 
logic, the simplest way seems to be to write using lseg and a linear number 
of quantified variables and an exponentially-sized formula. Now consider the 
property saying %1,...,%, all point to binary trees, with pointers left and right, 
and that can overlap arbitrarily. We can write it in FL as tree(x,)A...Atree(an), 
while a formula in (first-order) separation logic that expresses this property 
seems very complex. 


In summary, we believe that frame logic is a logic that supports frame rea- 
soning built on the same principles as separation logic, but is still translatable 
to first-order logic (avoiding the magic wand), and makes different choices for 
syntax/semantics that lead to expressing certain properties more naturally and 
compactly, and others more verbosely. 


Reasoning with Frame Logic using First-Order Reasoning Mechanisms. An ad- 
vantage of the adherence of frame logic to being translatable to a first-order 
logic with recursive definitions is the power to reason with it using first-order 
theorem proving techniques. While we do not present tools for reasoning in this 
paper, we note that there are several reasoning schemes that can readily handle 
first-order logic with recursive definitions. 

The theory of dynamic frames [18] has been proposed for frame reasoning for 
heap manipulating programs and has been adopted in verification engines like 
Dafny [21] that provide automated reasoning. A key aspect of dynamic frames 
is the notion of regions, which are subsets of locations that can be used to 
define subsets of the heap that change or do not change when a piece of code 
is executed. Program logics such as region logic have been proposed for object- 
oriented programs using such regions [1-3]. The supports of formulas in frame 
logic are also used to express such regions, but the key difference is that the 
definition of regions is given implicitly using supports of formulas, as opposed 
to explicitly defining them. Separation logic also defines regions implicitly, and 
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in fact, the work on implicit dynamic frames [31, 39] provides translations from 
separation logic to regions for reasoning using dynamic frames. 

Reasoning with regions using set theory in a first-order logic with recursive 
definitions has been explored by many works to support automated reasoning. 
Tools like VAMPIRE [20] for first-order logic have been extended in recent work to 
handle algebraic datatypes [19]; many data-structures in practice can be modeled 
as algebraic datatypes and the schemes proposed in [19] are powerful tools to 
reason with them using first-order theorem provers. 

A second class of tools are those proposed in the work on natural proofs [23, 
32, 37]. Natural proofs explicitly work with first order logic with recursive defi- 
nitions (FO-RD), implementing validity through a process of unfolding recursive 
definitions, uninterpreted abstractions, and proving inductive lemmas using in- 
duction schemes. Natural proofs are currently used primarily to reason with 
separation logic by first translating verification conditions arising from Hoare 
triples with separation logic specifications (without magic wand) to first-order 
logic with recursive definitions. Frame logic reasoning can also be done in a very 
similar way by translating it first to FO-RD. 

The work in [23] considers natural proofs and quantifier instantiation heuris- 
tics for FO-RD (using a similar setup of foreground sort for locations and back- 
ground sorts), and the work identifies a fragment of FO-RD (called safe fragment) 
for which this reasoning is complete (in the sense that a formula is detected as 
unsatisfiable by quantifier instantiation iff it is unsatisfiable with the inductive 
definitions interpreted as fixpoints and not least fixpoints). Since FL can be 
translated to FO-RD, it is possible to deal with FL using the techniques of [23]. 
The conditions for the safe fragment of FO-RD are that the quantifiers over 
the foreground elements are the outermost ones, and that terms of foreground 
type do not contain variables of any background type. As argued in [23], these 
restrictions are typically satisfied in heap logic reasoning applications. 


7 Related Work 


The frame problem [13] is an important problem in many different domains of 
research. In the broadest form, it concerns representing and reasoning about 
the effects of a local action without requiring explicit reasoning regarding static 
changes to the global scope. For example, in artificial intelligence one wants a 
logic that can seamlessly state that if a door is opened in a lit room, the lights 
continue to stay switched on. This issue is present in the domain of verification 
as well, specifically with heap-manipulating programs. 

There are many solutions that have been proposed to this problem. The most 
prominent proposal in the verification context is separation logic [12, 27, 28, 38], 
which we discussed in detail in the previous section. 

In contrast to separation logic, the work on Dynamic Frames [17, 18] and 
similarly inspired approaches such as Region Logic [1-3] allow methods to ex- 
plicitly specify the portion of the support that may be modified. This allows 
fine-grained control over the modifiable section, and avoids special symbols like 
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x and —*. However, explicitly writing out frame annotations can become verbose 
and tedious. 

The work on Implicit Dynamic Frames [22, 39, 40] bridges the worlds of 
separation logic (without magic wand) and dynamic frames— it uses separation 
logic and fractional permissions to implicitly define frames (reducing annotation 
burden), allows annotations to access these frames, and translates them into set 
regions for first-order reasoning. Our work is similar in that frame logic also 
implicitly defines regions and gives annotations access to these regions, and can 
be easily translated to pure FO-RD for first-order reasoning. 

One distinction with separation logic involves the non-unique heaplets in 
separation logic and the unique heaplets in frame logic. Determined heaplets 
have been used [29, 32, 37] as they are more amenable to automated reasoning. In 
particular a separation logic fragment with determined heaplets known as precise 
predicates is defined in [29], which we capture using frame logic in Section 5. 

There is also a rich literature on reasoning with these heap logics for program 
verification. Decidability is an important dimension and there is a lot of work on 
decidable logics for heaps with separation logic specifications [4-6, 11, 26, 33]. 
The work based on EPR (Effectively Propositional Reasoning) for specifying 
heap properties [14-16] provides decidability, as does some of the work that 
translates separation logic specifications into classical logic [34]. 

Finally, translating separation logic into classical logics and reasoning with 
them is another solution pursued in a lot of recent efforts [10, 23, 24, 32, 32, 
34-37, 41]. Other techniques including recent work on cyclic proofs [9, 42] use 
heuristics for reasoning about recursive definitions. 


8 Conclusions 


Our main contribution is to propose Frame Logic, a classical first-order logic 
endowed with an explicit operator that recovers the implicit supports of formulas 
and supports frame reasoning. we have argued its expressive by capturing several 
properties of data-structures naturally and succinctly, and by showing that it 
can express a precise fragment of separation logic. The program logic built using 
frame logic supports local heap reasoning, frame reasoning, and weakest tightest 
preconditions across loop-free programs. 

We believe that frame logic is an attractive alternative to separation logic, 
built using similar principles as separation logic while staying within the first- 
order logic world. The first-order nature of the logic makes it potentially amenable 
to easier automated reasoning. 

A practical realization of a tool for verifying programs in a standard program- 
ming language with frame logic annotations by marrying it with existing auto- 
mated techniques and tools for first-order logic (in particular [19, 24, 32, 37, 41]), 
is the most compelling future work. 
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Abstract. To provide high availability in distributed systems, object 
replicas allow concurrent updates. Although replicas eventually converge, 
they may diverge temporarily, for instance when the network fails. This 
makes it difficult for the developer to reason about the object’s prop- 
erties, and in particular, to prove invariants over its state. For the sub- 
class of state-based distributed systems, we propose a proof methodology 
for establishing that a given object maintains a given invariant, taking 
into account any concurrency control. Our approach allows reasoning 
about individual operations separately. We demonstrate that our rules 
are sound, and we illustrate their use with some representative examples. 
We automate the rule using Boogie, an SMT-based tool. 
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1 Introduction 


Many modern applications serve users accessing shared data in different ge- 
ographical regions. Examples include social networks, multi-user games, co- 
operative engineering, collaborative editors, source-control repositories, or dis- 
tributed file systems. One approach would be to store the application’s data 
(which we call object) in a single central location, accessed remotely. However, 
users far from the central location would suffer long delays and outages. 

Instead, the object is replicated to several locations. A user accesses the 
closest available replica. To ensure availability, an update must not synchronise 
across replicas; otherwise, when a network partition occurs, the system would 
block. Thus, a replica executes both queries and updates locally, and propagates 
its updates to other replicas asynchronously. 

Updates at different locations are concurrent; this may cause replicas to 
diverge, at least temporarily. Replicas may diverge, but if the system ensures 
Strong Eventual Consistency (SEC), this ensures that replicas that have received 
the same set of updates have the same state [25], simplifying the reasoning. 

The replicated object may also require to maintain some (application-specific) 
invariant, an assertion about the object. We say a state is safe if the invariant 
is true in that state; the system is safe if every reachable state is safe. In a se- 
quential system, this is straightforward (in principle): if the initial state is safe, 
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and the final state of every update individually is safe, then the system is safe. 
However, these conditions are not sufficient in the replicated case, because con- 
current updates at different replicas may interfere with one another. This can be 
fixed by synchronising between some or all types of updates. To maximise avail- 
ability and latency, such synchronisation should be minimised. In this paper, we 
propose a proof methodology to ensure that a given object is system-safe, for a 
given invariant and a given amount of concurrency control. In contrast to pre- 
vious works, we consider state-based objects.! Indeed, the specific properties of 
state-based propagation enable simple modular reasoning despite concurrency, 
thanks to the concept of concurrency invariant. Our proof methodology derives 
the concurrency invariant automatically from the sequential specification. Now, 
if the initial state is safe, and every update maintains both the application in- 
variant and the concurrency invariant, then every reachable state is safe, even 
in concurrent executions, regardless of network partitions. We have developed 
a tool named Soteria, to automate our proof methodology. Soteria analyses the 
specification to detect concurrency bugs and provides counterexamples. 
The contributions of this paper are as follows: 

— We propose a novel proof system specialised to proving the safety of avail- 
able objects that converge by propagating state. This specialisation supports 
modular reasoning, and thus it enables automation. 

— We demonstrate that this proof system is sound. Moreover, we provide a sim- 
ple semantics for state-propagating systems that allows us to ignore network 
messages altogether. 

— We present Soteria, to the best of our knowledge the first tool support- 
ing the verification of program invariants for state-based replicated objects. 
When Soteria succeeds it ensures that every execution, whether replicas are 
partitioned or concurrent, is safe. 

— We present a number of representative case studies, which we run through 
Soteria. 


2 Background 


As a running example, consider a simple auction system (for simplicity, we con- 
sider a single auction). An auction object is composed of the following parts: 


— Its Status, that can move from initial state INVALID (under preparation) to 
ACTIVE (can receive bids) and then to CLOSED (no more bids accepted). 
— The Winner of the auction, that is initially L and can become the bid taking 
the highest amount. In case of ties, the bid with the lowest id wins. 
— The set of Bids placed, that is initially empty. A bid is a tuple composed of 
e BidId: A unique identifier 
e Placed: A boolean flag to indicate whether the bid has been placed or 
not. Initially, it is FALSE. Once placed, a bid cannot be withdrawn. 
e The monetary Amount of the bid; this cannot be modified once the bid 
is created. 
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Fig. 1: Evolution of state of an auction object 


Figure 1 illustrates how the auction state evolves over time. The state of the 
object is geo-replicated at data centers in Adelaide, Brussels, and Calgary. Users 
at different locations can start an auction, place bids, close the auction, declare 
a winner, inspect the local replica, and observe if a winner is declared and who 
it is. The updates are propagated asynchronously to other replicas. All replicas 
will eventually agree on the same auction status, the same set of bids and the 
same winner. 

There are two basic approaches to propagating updates. The operation-based 
approach applies an update to some origin replica, then transmits the operation 
itself to be replayed at other replicas. If messages are delivered in causal order, 
exactly once, and concurrent operations are commutative, then two replicas that 
received the same updates reach the same state (this is the Strong Eventual 
Consistency guarantee, or SEC) [25]. 

The state-based approach applies an update to some origin replica. Occasion- 
ally, a replica sends its full state to some other replica, which merges the received 
state into its own. If the state space forms a monotonic semi-lattice, an update 
is an inflation (its output state is not lesser than the input state), and merge 
computes the least-upper-bound of the local and received states, then SEC is 
guaranteed [25]. As long as every update eventually reaches every replica, mes- 
sages may be dropped, re-ordered or duplicated, and the set of replicas may be 
unknown. Due to these relaxed requirements, state-based propagation is widely 
used in industry. Figure 1 shows the state-based approach with local operations 
and merges. Alternatives exist where only a delta of the state —that is, the 
portion of the state not known to be part of the other replicas— is sent as a 
message [1]; since this is an optimisation, it is of no consequence to the results 
of this paper. 


1 As opposed to operation-based. These terms are defined in Section 2. 
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Looking back to Figure 1, we can see that replicas diverge temporarily. This 
temporary divergence can lead to an unsafe state, in this case declaring a wrong 
winner. This correctness problem has been addressed before; however, previous 
works mostly consider the operation-based propagation approach [11, 13, 19, 24]. 


3 System Model 


In this section, we first introduce the object components, explain the underlying 
system model informally, and then formalise the operational semantics. 


3.1 General Principles 


An object consists of a state, a set of operations, a merge function and an in- 
variant. Figure 1 illustrates three replicas of an auction object, at three different 
locations, represented by the horizontal lines. The object evolves through a set of 
states. Each line depicts the evolution of the state of the corresponding replica; 
time flows from left to right. 


State. A distributed system consists of a number of servers, with disjoint memory 
and processing capabilities. The servers might be distributed over geographical 
regions. A set of servers at a single location stores the state of the object. This is 
called a single replica. The object is replicated at different geographical locations, 
each location having a full copy of the state. In the simplest case (for instance at 
initialisation) the state at all replicas will be identical. The state of each replica 
is called a local state. The global view, comprising all local states is called the 
global state. 


Operations. Each replica may perform the operations defined for the object. 
To support availability, an operation modifies the local state at some arbitrary 
replica, the origin replica for that operation, without synchronising with other 
replicas (the cost of synchronisation being significant at scale). An operation 
might consist of several changes; these are applied to the replica as a single 
atomic unit. 

Executing an operation on its origin replica has an immediate effect. However, 
the state of the other replicas, called remote replicas, remains unaltered at this 
point. The remote replicas get updated when the state is eventually propagated. 
An immediate consequence of this execution model is that in the presence of 
concurrent operations, replicas can reach different states, i.e. they diverge. 

Let us illustrate this with our example in Figure 1. Initially, the auction 
is yet to start, the winner is not declared and no bids are placed. By de- 
fault, a replica can execute any operation - start_auction, place_bid, and 
close_auction - locally without synchronising with other replicas. We see that 
the local states of replicas occasionally diverge. For example at the point where 
operation close_auction completes at the Adelaide replica, the Adelaide replica 
is aware of only a $100 bid, the Brussels replica has two bids, and the Calgary 
replica observes only one bid for $105. 
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State Propagation. A replica occasionally propagates its state to other replicas 
in the system and a replica receiving a remote state merges it into its own. 

In Figure 1, the arrows crossing between replicas represent the delivery of a 
message containing the state of the source replica, to be merged into the target 
replica. A message is labelled with the state propagated. For instance, the first 
message delivery at the Brussels replica represents the result of updating the 
local state (setting auction status to ACTIVE), with the state originating in the 
replica at Adelaide (auction started). 

Similar to the operations, a merge is atomic. In Figure 1, Alice closes the 
auction at the Adelaide replica. This atomically sets the status of the auction 
to CLOSED and declares a winner from the set of bids it is aware of. The up- 
dated auction state and winner are transmitted together. Merging is performed 
atomically by the Brussels replica.” 

We now specify the merge operation for an auction. The receiving replica’s 
local state is denoted ø = (status, winner, Bids), the received state is denoted 
o’ = (status’, winner’, Bids’) and the result of merge is denoted as Onew = 
(statuspew, Winner new, Bidspew). 
merge ((status , winner ,Bids) ,(status’,winner’,Bids’)) 

statusnew := max(status,status’) 
winnernew := winner’ #1? winner’ : winner 
for (b in Bids U Bids’) 


Bidsnew-b. placed := Bids.b.placed V Bids’.b.placed 
Bidspjew-b. amount := max(Bids.b.amount, Bids’.b.amount) 


Furthermore, we require the operations and merge to be defined in a way that 
ensures convergence. We discuss the relevant properties later in Section 6.1. 


Invariants. An invariant is an assertion that must evaluate to true in every local 
state of every replica. Although evaluated locally at each replica, the invariant 
is in effect global, since it must be true at all replicas, and replicas eventually 
converge. For our running example, the invariant can be stated as follows: 


— Only an active auction can receive bids, and 
— the highest unique bid wins when the auction closes (breaking ties using bid 
identifiers). 


This condition must hold true in all possible executions of the object. 


3.2 Notations and Assumptions 
First, we introduce some notations and assumptions: 


— We assume a fixed set of replicas, ranged over with the meta-variable r € R 
sampled from the domain of unique replica names R. 

— We denote a local state with the meta-variable o € X ranged over the domain 
of states of the object X. 


2 We see that this leads to an unsafe state, we discuss this in detail in Section 4.2 
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— The local semantic function [|] takes an operation and a state, and returns the 
state after applying the operation. We write [op](7) = Onew for executing 
operation op on state o resulting in a new state Onew.- 

— 92 denotes a partial function returning the current state of a replica. For 
instance (r) = o means that in global state 2, replica r is in local state 
o. We will use the notation Q[r + g] to denote the global state resulting 
from replacing the local state of replica r with o. The local state of all other 
replicas remains unchanged in the resulting global state.’ 

— A message propagating states between replicas is denoted ( r % r’ ). This 
represents the fact that replica r has sent a message (possibly not yet re- 
ceived) to replica r’, with the state o as its payload. The meta-variable M 
denotes the messages in transit in the network. 

— In the following sub-section, we will utilise a set of states to record the history 
of the execution. The set of past states will be ranged over with the variable 
se P(X). 

— All replicas are assumed to start in the same initial state o;. Formally, for 
each replica r € dom({2;) we have (2;(r) = ci. 


3.3 Operational Semantics 


In this and the following subsections we will present two semantics for systems 
propagating states. Importantly, while the first semantics takes into account 
the effects of the network on the propagation of the states, and is hence an 
accurate representation of the execution of systems with state propagation, we 
will show in the next subsection that reasoning about the network is unnecessary 
in this kind of system. We will demonstrate this claim by presenting a much 
simpler semantics in which the network is abstracted away. The importance 
of this reduction is that the number of events to be considered, both when 
conducting proofs and when reasoning about applications, is greatly reduced. 
As informal evidence of this claim, we point at the difference in complexity 
between the semantic rules presented in Figure 2 and Figure 3. We postpone the 
equivalence argument to Theorem 1. 

Figure 2 presents the semantic rules describing what we shall call the precise 
semantics (we will later present a more abstract version) defining the transition 
relations describing how the state of the object evolves. 

The figure defines a semantic judgement of the form (2,M) > (Qnew,Mnew) 
where (§2,M) is a configuration where the replica states are given by 2 as shown 
above, and M is a set of messages that have been transmitted by different replicas 
and are pending to be received by their target replicas. 

Rule OPERATION presents the state transition resulting from a replica r 
executing an operation op. The operation queries the state of replica r, evaluates 
the semantic function for operation op and updates its state with the result. The 


3 This notation of a global state is used only to explain and prove our proof rule. In 
fact, the rule is based only on the local state of each replica. 
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OPERATION 
Q(r) =o lop] (o) = onew Qnew = RE — Oren] 


(Q2,M) > (Qnew,M) 


SEND 
Qr) =o r’ € dom() \ {r} Mnew =MU {lr >r )} 
(Q,M) > (2,Mnew) 


Ar) = 


Mnew =M\{(r' 2>xr)} [merge] (o,o 5e =Onew  Nnew = Ale + onew!] 
(2,M) > (Qnew, Mnew) 


MERGE 


Op & BROADCAST 
Q(x) =o [op] (0 j= = Onew Preu = Q[xr — Onew!| 
Mnew =MU{ ( r S r" ) | x’ € dom(@) \ {r} } 
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MERGE & BROADCAST 
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Mnew =M\{( x" Zhe )} [merge] (c, a) = One Qrew = Q[r + Onew| 
Mnew! = Mnew U { (r S r" ) | r” € dom() \ {r} } 
(2, M) > (Qnew,Mnew’) 


Fig. 2: Precise Operational Semantics: Messages 


set of messages M does not change. The second rule, SEND, represents the non- 
deterministic sending of the state of replica r to replica r’. The rule has no other 
effect than to add a message to the set of pending messages M. The MERGE rule 
picks any message, ( r’ = r ), in the set of pending messages M, and applies the 
merge function to the destination replica with the state in the payload of the 


+ 
message, removing ( r’ 7+ r ) from M. 


The final two rules, OP & BROADCAST and MERGE & BROADCAST represent 
the specific case when the states are immediately sent to all replicas. These rules 
are not strictly necessary since they are subsumed by the application of either 
OPERATION or MERGE followed by one SEND per replica. We will, however, use 
them to simplify a simulation argument in what follows. 


We remark at this point that no assumptions are made about the duplication 
of messages or the order in which messages are delivered. This is in contrast to 
other works on the verification of properties of replicated objects [11, 13]. The 
reason why this assumption is not a problem in our case is that the least-upper- 
bound assumption of the merge function, as well as the inflation assumptions on 
the states considered in Item 2 (Section 6.1) mean that delayed messages have 
no effect when they are merged. 
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OPERATION 
Q(r)=0 lop] (o) = onew NRnew = Q[r — Onew] 


(2,8) > (Qnew, SU {anew }) 


MERGE 
Q(r) =o a és [merge] (0, 0’) = Onew Qrew = Q[r — new] 


(2,¢) = (Qnew, SU {new }) 


Fig. 3: Semantic Rules with a History of States 


As customary we will denote with (2, M) Ž (Qnew,Mnew) the repeated appli- 
cation of the semantic rules zero or more times, from the state (2,M) resulting 
in the state (Qnew,Mnew)- 

It is easy to see how the example in Figure 1 proceeds according to these 
rules for the auction. 

The following lemma,* to be used later, establishes that whenever we use 
only the broadcast rules, for any intermediate state in the execution, and for 
any replica, when considering the final state of the trace, either the replica 
has already observed a fresher version of the state in the execution, or there 
is a message pending for it with that state. This is an obvious consequence of 
broadcasting. 


Lemma 1. If we consider a restriction to the semantics of Figure 2 where in- 
stead of applying the OPERATION rule of Figure 2 we apply the OP & BROAD- 
CAST rule always, and instead of applying the MERGE rule we apply MERGE & 
BROADCAST always, we can conclude that given an execution starting from an 
initial global state 92; with 


(2;,0) = (2,M) 5 (Onéew: Mriew) 
for any two replicas r and r' and a state o such that Q(r) =c, then either: 


— Qnew(r’) > 0, or 
—{rSr')€ Mew: 


3.4 Operational Semantics with State History 


We now turn our attention to a simpler semantics where we omit messages from 
configurations, but instead, we record in a separate set all the states occurring 
in any replica throughout the execution. 

The semantics in Figure 3 presents a judgement of the form (2, S) > (Qnew, Snew) 
between configurations of the form (9, S) as before, but where the set of messages 
is replaced by a set of states denoted with the meta-variable S € P(X). 


+ The proofs for the lemmas are included in the extended version{23]. 
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The rules are simple. OPERATION executes an operation as before, and it 
adds the resulting new state to the set of observed states. The rule MERGE 
non-deterministically selects a state in the set of states and it merges a non- 
deterministically chosen replica with it. The resulting state is also added to the 
set of observed states. 


Lemma 2. Consider a state (2,8) reachable from an initial global state Q; with 
the semantics of Figure 3. Formally: (Q;,{0;}) S (Q,8). We can conclude that 
the set of recorded states in the final configuration S includes all of the states 
present in any of the replicas 


U {@@}) cs 


r€dom({2) 


3.5 Correspondence between the semantics 


In this section, we show that removing the messages from the semantics, and 
choosing to record states instead renders the same executions. To that end, we 
will define the following relation between configurations of the two semantics 
which will be later shown to be a bisimulation. 


Definition 1 (Bisimulation Relation). We define the relation Ro, between 
a configuration (Q,M) of the semantics of Figure 2 and a configuration (2,8) of 
the semantics of Figure 3 parameterized by an initial global state N; and denoted 
by 

(2,M) Ra, (2,8) 


when the following conditions are met: 


1. (2,0) > (Q,M), and 
2. (Ri, {a;}) 4 (2,8), and 
3. f{o|(r3r')em}Cs 


In other words, two states represented in the two configurations are related 
if both are reachable from an initial global state and all the states transmitted 
by the messages (M) is present in the history (S). 

We can now show that this relation is indeed a bisimulation. We first show 
that the semantics of Figure 3 simulates that of Figure 2. That is, all behaviours 
produced by the precise semantics with messages can also be produced by the 
semantics with history states. This is illustrated in the commutative diagram 
of Figure 4a and Figure 4b, where the dashed arrows represent existentially 
quantified components that are proven to exist in the theorem. 


Lemma 3 (State-semantics simulates Messages-semantics). Consider a 
reachable state (2,M) from the initial state N; in the semantics of Figure 2. 
Consider moreover that according to that semantics there exists a transition of 
the form 

(2, M) = (Qreuis Mrew) 
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(Q; 0) - (Q, M) (Qnew; Mnew) (Qr {oi}) —> (Q, S) —7 (Qnew, Snew) 
Ra, Ra, Ra, Ra; 
(Q4, {ai}) — (Q, s) aiao > (Orews Snew) (Qi, 0) — (Q, M) aca > (Orais Mou) 
(a) Precise to History-preserving (b) History-preserving to Precise 
Simulation Simulation 


Fig. 4: Simulation Schema 


and consider that there exists a state (2,8) of the history preserving semantics 
of Figure 3 such that they are related by the simulation relation 


(Q, M) Ra; (Q, S) 


We can conclude that, as illustrated in Figure 4a, there exists a state (Qnew, Snew) 
such that 


(Q, S) > (Preus Snew) and (neus Mnew) Ro; (Qnew, Snew) 


We will now consider the lemma showing the inverse relation. To that end we 
will consider a special case of the semantics of Figure 2 where instead of apply- 
ing the OPERATION rule, we will always apply the OP & BROADCAST rule, and 
instead of the MERGE rule, we will apply MERGE & BROADCAST. As we men- 
tioned before, this is equivalent to the application of the OPERATION/MERGE 
rule, followed by a sequence of applications of SEND. The reason we will do this 
is that we are interested in showing that for any execution of the semantics in 
Figure 3 there is an equivalent (simulated) execution of the semantics of Fig- 
ure 2. Since all states can be merged in the semantics of Figure 3 we have to 
assume that in the semantics of Figure 2 the states have been sent with messages. 
Fortunately, we can choose how to instantiate the existential send messages to 
apply the rules as necessary, and that justifies this choice. 


Lemma 4 (Messages-semantics simulates State-semantics). Consider a 
reachable state (92,8) from the initial state Q; in the semantics of Figure 3. 
Consider moreover that according to that semantics there exists a transition of 
the form 

(2; S) = (Qnew, Snew) 


and consider that there exists a state ({2,M) of the state-preserving semantics of 
Figure 3 such that they are related by the simulates relation 


(2,M) Ro; (2, S) 
We can conclude that there exists a state (Qnew,Mnew) such that 


(Q, M) = (Preys Miew ) and (Prei Maei) Ro (aew Srei) 
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As before, an illustration of this lemma is presented in Figure 4b. 
We can now conclude that the two semantics are bisimilar: 


Theorem 1 (Bisimulation). The semantics of Figure 2 and Figure 8 are 
bisimilar as established by the relation defined in Definition 1. 


The theorem above justifies carrying out our proofs with respect to the se- 
mantics of Figure 3, which has fewer rules and it better aligns with our proof 
methodology. This is also justifies that when reasoning semantically about state- 
propagating object systems we can generally ignore the effects of network delays 
and messages. 

From the standpoint of concurrency, the system model allows the execution of 
asynchronous concurrent operations, where each operation is executed atomically 
in each replica, and the aggregation of results of different operations is performed 
lazily as replicas exchange their state. At this point, we assume the set of states, 
along with the operations and merge, forms a monotonic semi-lattice. This is a 
sufficient condition for Strong Eventual Consistency [3, 4, 25]. 

We have seen that even though we achieve convergence later, there can be 
instances or even long periods of time during which replicas might diverge. We 
need to ensure that the concurrent executions are still safe. In the next section, 
we discuss how to ensure safety of distributed objects built on top of the system 
model we described. 


4 Proving Invariants 


In this section, we report our invariant verification strategy. Specifically, we con- 
sider the problem of verifying invariants of highly-available distributed objects. 

To support the verification of invariants we will consider a syntactic-driven 
approach based on program logic. Bailis et al.[2] identifies necessary and sufficient 
run-time conditions to establish the security of application invariants for highly- 
available distributed databases in a criterion dubbed J-confluence. Moreover, 
they consider the validity of a number of typical invariants and applications. 
Our work improves on the -confluence criterion defined in [2] by providing a 
static, syntax-driven, and mostly-automatic mechanism to verify the correctness 
of an invariant for an application. We will address the specific differences in 
Section 7, the related work. 

An important consequence of our verification strategy is that while we are 
proving invariants about a concurrent highly-distributed system, our verification 
conditions are modular (on the number of API operations), and can be carried 
out using standard sequential Hoare-style reasoning. These verification condi- 
tions in turn entail stability of the assertions as one would have in a logic like 
Rely/Guarantee. 

Let us start by assuming that a given initial state for the object is denoted 
ci. Initially, all replicas have øg; as their local state. As explained earlier, each 
replica executes a sequence of state transitions, due either to a local update or 
to a merge incorporating remote updates. 
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Let us call safe state a replica state that satisfies the invariant. Assuming 
the current state is safe, any update (local or merge) must result in a safe state. 
To ensure this, every update is equipped with a precondition that disallows any 
unsafe execution.” Thus, a local update executes only when, at the origin replica, 
the current state is safe and its precondition currently holds. 

Formally, an update u (an operation or a merge), mutates the local state ø, to 
a new state Onew = u(c). To preserve the invariant, Inv, we require that the local 
state respects the precondition of the update, Pre,: o € Pre, = > u(o) € Inv 

To illustrate local preconditions, consider an operation close_auction(w: 
BidId), which sets auction status to CLOSED and the winner to w (of type BidId). 
The developer may have written a precondition such as status = ACTIVE be- 
cause closing an auction doesn’t make sense otherwise. In order to ensure the 
invariant that the winner has the highest amount, one needs to strengthen it 
with the clause is_highest (Bids, w), defined as 


V b € Bids, b.placed => b.Amount < w. Amount 


Similarly, merge also needs to be safe. To illustrate merge precondition, let 
us use our running example. We wish to maintain the invariant that the highest 
bid is the winner. Assume a scenario where the local replica declared a winner 
and closed the auction. An incoming state from a remote replica contains a bid 
with a higher amount. When the two states are merged, we see that the resulting 
state is unsafe. So we must strengthen the merge operation with a precondition. 
The strengthened precondition looks like this: 


status = CLOSED => V Bids € P(Bids), is_highest(Bids, w) 
A status’ = CLOSED => V Bids € P(Bids), is_highest (Bids, w) 


This means that if the status is CLOSED in either of the two states, the winner 
should be the highest bid in any state. This condition ensures that when a winner 
is declared, it is the highest bid among the set of bids in any state at any replica. 

Since merge can happen at any time, it must be the case that its precondition 
is always true, i.e., it constitutes an additional invariant. We call this as the 
concurrency invariant. Now our global invariant consists of two parts: first, the 
invariant (Inv), and second, the concurrency invariant(InVcone). 


4.1 Invariance Conditions 


The verification conditions in Figure 5 ensure that for any reachable local state 
of a replica, the global invariant Inv A Inveone, is a valid assertion. We assume 
the invariant to be a Hoare-logic style assertion over the state of the object. 
In a nutshell, all of these conditions check (i) the precondition of each of the 
operations, and that of the merge operation uphold the global invariant, and 
(ii) the global invariant of the object consists of the invariant and the concurrency 
invariant (precondition of merge). 

We will develop this intuition in what follows. Let us now consider each of 
the rules: 
5 Technically, this is at least the weakest-precondition of the update for safety. It 

strengthens any a priori precondition that the developer may have set. 
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o) = Onew 


a i 
(0,0) E Prenerge ^ 
VOO Onen (0,0) E INVconc A > (new, o’) E Inveone (6) 
[merge] (0, 0’) = Cnew 


Fig. 5: Invariant Conditions 


— Clearly, the initial state of the object must satisfy the global invariant, this 
is checked by conditions (1) and (4). 


The rest of the rules perform a kind of inductive reasoning. Assuming that we 
start in a state that satisfies the global invariant, we need to check that any 
state update preserves the validity of said invariant. Importantly, this reasoning 
is not circular, since the initial state is known by the rule above to be safe.® 


— Condition (2) checks that each of the operations, when executed starting 
in a state satisfying its precondition and the invariant, is safe. Notice that 
we require that the precondition of the operation be satisfied in the start- 
ing state. This is the core of the inductive argument alluded to above, all 
operations — which as we mentioned in Section 3 execute atomically w.r.t. 
concurrency — preserve the invariant Inv. 


Other than the execution of operations, the other source of local state changes 
is the execution of the merge function in a replica. It is not true in general that 
for any two given states of an object, the merge should compute a safe state. 
In particular, it could be the case that the merge function needs a precondition 
that is stronger than the conjunction of the invariants in the two states to be 
merged. The following rules deal with these cases. 


— We require the merge function to be annotated with a precondition strong 
enough to guarantee that merge will result in a safe state. Generally, this 


6 Indeed, the proof of soundness of program logics such as Rely/Guarantee are typi- 
cally inductive arguments of this nature. 
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precondition can be obtained by calculating the weakest precondition [9] of 
merge w.r.t. the desired invariant. Since merge is the only operation that 
requires two states as input, the precondition of merge has two states. We 
can then verify that merging two states is safe. This is the purpose of rule (3). 


As per the program model of Section 3, any two replicas can exchange their states 
at any given point of time and trigger the execution of a merge operation. Thus, 
it must be the case that the precondition of the merge function is enabled at all 
times between any two replica local states. Since merge is the only point where 
a local replica can observe the result of concurrent operations in other replicas, 
we call this a concurrency invariant (InNVconc). In other words: the concurrency 
invariant is part of the global invariant of the object. This is the main insight 
that allows us to reduce the proof of the distributed object to checking that both 
the invariant Inv and the concurrency invariant INVcony are global invariants. In 
particular, the latter implies the former, but for exposition purposes we shall 
preserve the invariant Inv in the rules. 


— Just as we did with the operations above, we now need to check that when- 
ever we have a pair of states that satisfy the concurrency invariant, if one 
of these states changes, the resulting pair still satisfies the concurrency in- 
variant. This is exactly the purpose of rule (5) in the case where the state 
change originates from an operation execution in one of the replicas of the 
pair. This rule is similar to rule (2) above, where the invariant Inv has been 
replaced by Inveonc, and consequently we have a pair of states. 

— Finally, as we did with rule (3), we need to check the case where one of the 
states of a pair of states satisfying InVeonc is updated because of yet another 
merge happening (w.r.t. yet another replica) in one of these states. This is 
the purpose of rule (6) which is similar to rule (3), with Inv replaced for 


INVeone: 


As anticipated at the beginning of this section, the reasoning about the con- 
currency is performed in a completely local manner, by carefully choosing the 
verification conditions, and it avoids the stability blow-up commonly found in 
concurrent program logics. The program model, and the verification conditions 
allow us to effectively reduce the problem of verifying safety of an asynchronous 
concurrent distributed system, to the modular verification of the global invariant 
(Inv A InVeone) as pre and post conditions of all operations and merge. 


Proposition 1 (Soundness). The proof rules in equations (1)-(6) guarantee 
that the implementation is safe. 


To conduct an inductive proof of this lemma we need to strengthen the 
argument to include the set of observed states as given by the semantics of 
Figure 3. 


Lemma 5 (Strengthening of Soundness). Assuming that the equations (1)- 
(6) hold for an implementation of a replicated object with initial state Q;. For 
any state (Q,S) reachable from (Q;,{o;}), that is (Q;,{o;}) > (2,8), we have 
that: 
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1. for all states o,o’ € S, (0,0) E InVeone, and 


2. for any state o € S, o F Inv. 
Corollary 1. The soundness proposition (1) is a direct consequence of Lemma 5. 


We remark at this point that there are numerous program logic approaches 
to proving invariants of shared-memory concurrent programs, with Rely/Guar- 
antee [15] and concurrent separation logic [6] underlying many of them. While 
these approaches could be adapted to our use case (propagating-state distributed 
systems), this adaptation is not evident. As an indication of this complexity: one 
would have to predicate about the different states of the different replicas, re- 
state the invariant to talk about these different versions of the state, encode the 
non-deterministic behaviour of merge, etc. Instead, we argue that our specialised 
rules are much simpler, allowing for a purely sequential and modular verification 
that we can mechanise and automate. This reduction in complexity is the main 
theoretical contribution of this paper. 


4.2 Applying the proof rule 


Let us apply the proof methodology to the auction object. Its invariant is the 
following conjunction: 


1. Only an ACTIVE auction can receive bids, and 
2. the highest bid, also unique, wins when the auction is CLOSED. 


Computing the weakest precondition of each update operation, for this invariant 
is obvious. For instance, as discussed earlier, close_auction(w) gets precondi- 
tion is_highest (Bids, w), because of Invariant Item 2 above. 

Despite local updates to each replica respecting the invariant Inv, Figure 1 
showed that it is susceptible of being violated by merging. This is the case if Bob’s 
$100 bid in Brussels wins, even though Charles concurrently placed a $105 bid 
in Calgary; this occurred because status became CLOSED in Brussels while still 
ACTIVE in Calgary. The weakest precondition of merge for safety expresses that, 
if status in either state is CLOSED, the winner should be the bid with the highest 
amount in both the states. This merge precondition, now called the concurrency 
invariant, strengthens the global invariant to be safe in concurrent executions. 

Let us now consider how this strengthening impacts the local update opera- 
tions. Since starting the auction doesn’t modify any bids, the operation trivially 
preserves it. Placing a bid might violate Inveone if the auction is concurrently 
closed in some other replica; conversely, closing the auction could also violate 
InVeone, if a higher bid is concurrently placed in a remote replica. Thus, the auc- 
tion object is safe when executed sequentially, but it is unsafe when updates are 
concurrent. This indicates the specification has a bug, which we now proceed to 
fix. 
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4.3 Concurrency Control for Invariant Preservation 


As we discussed earlier, the preconditions of operations and merge are strength- 
ened in order to be sequentially safe. An object must also preserve the con- 
currency invariant in order to ensure concurrent safety. Violating this indicates 
the presence of a concurrency bug in the specification. In that case, the opera- 
tions that fail to preserve the concurrency invariant might need to synchronise. 
The developer adds the required concurrency control mechanisms as part of the 
state in our model. The modified state is now composed of the state and the 
concurrency control mechanism. 

Recall that in the auction example, placing bids and closing the auction did 
not preserve the precondition of merge. This requires strengthening the specifi- 
cation by adding a concurrency control mechanism to restrict these operations. 
We can enforce them to be strictly sequential, thereby avoiding any concurrency 
at all. But this will affect the availability of the object. 

A concurrency control can be better designed with the workload character- 
istics in mind. For this particular use case, we know that placing bids are much 
more frequent operations than closing an auction. Hence we try to formulate a 
concurrency control like a readers-writer lock. In order to realise this we dis- 
tribute tokens to each replica. As long as a replica has the token, it can allow 
placing bids. Closing the auction requires recalling the tokens from all replicas. 
This ensures that there are no concurrent bids placed and thus a winner can 
be declared, respecting the invariant. The addition of this concurrency control 
also updates the Invceonc. Clearly, all operations must respect this modification 
for the specification to be considered safe. 

Note that the token model described here restricts availability in order to 
ensure safety. Adding efficient synchronization is not a problem to be solved 
only with application specification in hand, it rather requires the knowledge of 
the application dynamics such as the workload characteristics and is part of our 
future work. 

Figure 6 shows the evolution of the modified auction object with concur- 
rency control. The keys shown are the tokens distributed to each replica. When 
a replica wants to close the auction, it can request tokens from other replicas. 
When a replica releases its token, it is indicated by a cross mark on the key. This 
concurrency control mechanism makes sure that the object is safe during con- 
current executions as well. The specification including the concurrency control 
is given in the extended version[23]. 

To summarize, all updates (operations and merge) have to respect the global 
invariant (Inv A Inveonc). If an update violates Inv, the developer must strengthen 
its precondition. If an update violates InVconc, the developer must add concur- 
rency control mechanisms. 


5 Case Studies 


This section presents three representative examples of different consistency re- 
quirements of several distributed applications. The consensus object is an ex- 
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Fig. 6: Evolution of state in an auction object with concurrency control 


ample of a coordination-free design, illustrating a safe object with just eventual 
consistency. The next example of a distributed lock shows a design that main- 
tains a total order, illustrating strong consistency. And the final example of 
courseware shows a mix of concurrent operations and operations with restrained 
concurrency. This example, similar to our auction example, illustrates applica- 
tions that might require coordination for some operations to ensure safety. 

For each case study, we give an overview of the operational semantics infor- 
mally. We then discuss how the design preserves the safety conditions discussed 
in Section 4. We also provide pseudocode for better comprehension. 


5.1 Consensus application 


Consensus is required in distributed systems when all replicas have to agree 
upon a single value. We consider the specification of a consensus object with a 
fixed number of replicas. We assume that replica failures are solved locally by 
redundancy or other means, and all replicas participate. 

The state consists of a boolean flag indicating the result of consensus, and 
a boolean array indicating the votes from replicas. Each replica agrees on a 
proposal by setting its dedicated entry in the boolean array. A replica cannot 
withdraw its agreement. A replica sets the consensus flag when it sees all entries 
of the boolean array set. 

The consistency between the values of agree flag and the boolean array is 
ensured by the invariant. The merge function is the disjunction of the individual 
components. In this case study, we can see that the merge ensures safety without 
any additional precondition. This means that the object is trivially safe under 
concurrent executions. 


Proving the safety of highly-available distributed objects 561 


Initial state: Comparison function: 
-~B A n7flag flag V (-flago A (B V —Bo)) 
Invariant: {Premarx: True} 
flag => B # no precondition 
mark (): 
{Pr emerge: True} B.me := true 
# no precondition 
merge(B, flag, Bo, flago): {Preagree: B} 
B := B V Bo agree (): 
flag := flag V flago flag := true 


Fig. 7: Pseudocode for consensus 


Comparison function: 
t > to 


Initial state: 
V (t = to AV = Vo) 


dr, V.rAt=0 


{Prenerge 
(t = to = V = Vo) 
A (V.me = t > to)} 


{Pretransfer : V.me } 
transfer(r,): 


t = ttl 
V.me := false merge((t,V),(to,Vo)): 
V.ro := true t = max(t,to) 
v = (to<t)?V:Vo 
Invariant: 


dr, V.r A Vr, ro, (V.r A V.ro) = r = ro 


Fig. 8: Specification of a distributed lock 


The pseudo code of the consensus example is shown in Figure 7. The design 
for consensus can be relaxed, requiring only the majority of replicas to mark 
their boxes. The extension for that is trivial. 


5.2 A replicated concurrency control 


We now discuss an object, a distributed lock, that ensures mutual exclusion. We 
use an array of boolean values, one entry per replica, to model a lock. If a replica 
owns the lock, the corresponding array entry is set to true. The lock is transferred 
to any other replica by using the transfer function. The full specification is shown 
in Figure 8. 

We need to ensure that the lock is owned by exactly one replica at any given 
point in time, which is the invariant here. For simplicity, we are not considering 
failures. In order to preserve safety, we need to enforce a precondition on the 
transfer operation such that the operation can only transfer the ownership of 
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its origin replica. For state inflation, a timestamp associated with the lock is 
incremented during each transfer. 

A merge of two states of this distributed lock will preserve the state with 
the highest timestamp. In order for the merge function to be the least upper 
bound, we must specify that if the timestamps of the two states are equal, their 
corresponding boolean arrays are also equal. Also if the origin replica owns the 
lock, it has the highest timestamp. The conjunction of these two restrictions 
which form the precondition of merge, Premerge, is the concurrency invariant, 
InVeone: 

Consider the case of three replicas r1, rg and r3 sharing a distributed lock. 
Assume that initially replica rı owns the lock. Replicas rg and r3 concurrently 
place a request for the lock. The current owner rı, has to make a decision on the 
priority of the requests based on the business logic. rı calculates a higher priority 
for r3 and transfers the lock to r3. Since rı no longer has the lock, it cannot issue 
any further transfer operations. We see here clearly that the transfer operation is 
safe. In the new state, r3 is the only replica that can perform a transfer operation. 
We can also note that this prevents any concurrent transfer operations. This can 
guarantee mutual exclusion and hence ensures safety in a concurrent execution 
environment. 

An interesting property we can observe from this example is total order. Due 
to the preconditions imposed in order to be safe, we see that the states progress 
through a total order, ordered by the timestamp. The transfer function increases 
the timestamp and merge function preserves the highest timestamp. 


5.3 Courseware 


We now look at an application that allows students to register and enroll in a 
course. For space reasons, we elide the pseudocode which can be found in the 
extended version[23]. The state consists of a set of students, a set of courses and 
enrollments of students for different courses. Students can register and deregister, 
courses can be created and deleted, and a student can enroll for a course. The 
invariant requires enrolled students and courses to be registered and created 
respectively. 

The set of students and courses consists of two sets - one to track registrations 
or creations and another to track deregistrations or deletions. Registration or cre- 
ation monotonically adds the student or course to the registered sets respectively 
and deregistration or deletion monotonically adds them to the unregistered sets. 
The semantics currently doesn’t support re-registration, but that can be fixed 
by using a slightly modified data structure that counts the number of times the 
student has been registered/unregistered and decides on the status of registra- 
tion. Enrollment adds the student-course pair to the set. Currently, we do not 
consider canceling an enrollment, but it is a trivial extension. Merging two states 
takes the union of the sets. 

Let us consider the safety of each operation. The operations to register a 
student and create a course are safe without any restrictions. Therefore they do 
not need any precondition. The remaining three operations might violate the 
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invariant in some cases. This leads to strengthening their preconditions. The 
precondition of the operation for deregistering a student and deleting a course 
requires no existing enrollments for them. For enrollment, both the student and 
the course should be registered/created and not unregistered/deleted. 

Merge also requires strengthening of its precondition. It requires the set of 
enrolled students and courses to be registered and not unregistered in all the 
remote states as well. This is the concurrent invariant (InVeonc) for this object. 

Running this specification through our tool which we describe in Section 6 
reveals concurrency issues for deregistering a student, deleting a course and 
enrollment. This means that we need to add concurrency control to the state. 

For this use case, we know that enrolling will be more frequent than dereg- 
istering a student or deleting a course. So, we model a concurrency control 
mechanism as in the case of the auction object discussed earlier. We assign a 
token to each replica for each student and course, called a student token and 
course token respectively. A replica will have a set of student tokens indicating 
the registered students and course tokens indicating the created courses. In order 
to deregister a student or delete a course, all replicas must have released their 
tokens for that particular student/course. Enroll operations can progress as long 
as the student token and course token are available at the local replica for the 
student and course for that particular enrollment. 

This concurrency control mechanism now forms part of the state. The precon- 
ditions of operations and merge are recomputed and the concurrency invariant 
is updated. The edited specification passes all checks and is deemed safe. 


6 Automation 


In this section, we present a tool to automate the verification of invariants as 
discussed in the previous sections. Our tool, called Soteria is based on the Boogie 
[5] verification framework. The input to Soteria is a specification of the object 
written as Boogie procedures, augmented with a number of domain-specific an- 
notations needed to check the properties described in Section 4. 

Let us now consider how a distributed object is specified in Soteria.: 


— State: We require the programmer to provide a declaration of the state 
using the global variables in Boogie. The data types can be either built-in 
or user defined. 

— Comparison function: Next we require the programmer to provide a com- 
parison function. This function determines the partial order on states. Again, 
we shall use this comparison function as a basis to check the lattice condi- 
tions, and whether each operation is an inflation on the lattice. We use the 
keyword @gteq to annotate the comparison function in the tool. This com- 
parison function returns true when all the components of the first state are 
greater than or equal to the corresponding components in the other state. It 
is encoded as a function in Boogie. 

— Operations: We require the programmer to provide the implementation of 
the operations of the object. Moreover, for each operation op we require the 


564 S. S. Nair et al. 


programmer to provide the precondition Prep. In general, operations are 
encoded as Boogie procedures. Alternatively, we could just require only a 
post-condition describing how the state transitions from the precondition to 
the post-condition. Notice that since in our program model operations are 
atomic, this is an unambiguous encoding of the operations. 

A few things are important in this code. The specification declares opera- 
tions that can modify the contents of the global variables as declared in the 
modifies clause. Preconditions are annotated with the requires clauses, 
and the postcondition is specified by the ensures clauses. The semantics of 
multiple requires and ensures clauses is conjunction. 

— Merge function: We require the special merge operation to be distin- 

guished from other operations. To that end, we use the annotation @merge. 
While, as mentioned before, the precondition of merge can be obtained by 
calculating the weakest precondition to ensure safety. The current version of 
Soteria does not perform this step automatically, it relies on the developer to 
provide the preconditions. Notice that, as we argued in Section 4.1, Soteria 
will consider this as the concurrency invariant (InVeonc)- 
While in Section 3 we mentioned that the merge procedure takes two states 
as arguments, in the specification input to Soteria, the procedure merge takes 
only one state as the argument. This is because this procedure assumes that 
the merge is being applied in a replica, and therefore, the local state of the 
replica is captured by the global variables. 

— Invariant: Clearly, we require the programmer to provide the invariant to 
be verified by the tool. This invariant is simply provided as a Boogie asser- 
tion over the state of the object. Once more, we require the invariant to be 
annotated with the special keyword @invariant. 


While these are the components required by Soteria to check the safety, often 
Boogie requires additional information to verify the procedures. Some of these 
components are: 


— User-defined data types, 

— Constants to declare special objects such as the origin replica me, or to 
bound the quantifiers, 

— We sometimes make recourse to inductively-defined functions over aggregate 
data structures, for instance, to obtain the maximum in a set of values. Since 
we would like to use these functions in the specifications, we axiomatise their 
semantics to enable the SMT solver used by Boogie to discharge our proof 
obligations. This is particularly important for list comprehensions, and array 
operations. We follow the approach of Leino et al.[18]. 

— When we iterate over lists, arrays or matrices, we need to provide Boogie 
with loop invariants. Loops are part of the programs, and thus, verified by 
Boogie. 


6.1 Verification passes 


The verification of a specification is performed in multiple stages. Let us consider 
these in order: 
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1. Syntax checks 
The first simple checks validate that the specification provided respects Boo- 
gie syntax when ignoring Soteria annotations. It also calls Boogie to validate 
that the types are correct and that the pre/post conditions provided are 
sound. 
Then it checks that the specification provides all the elements necessary for a 
complete specification. Specifically, it checks the function signatures marked 
by @gteq and @invariant and the procedure marked by @merge. 

2. Convergence check 
This stage checks the convergence of the specification. Specifically, it checks 
whether the specification respects Strong Eventual Consistency. The Strong 
Eventual Consistency (SEC) property states that any two replicas that re- 
ceived the same set of updates are in the same state. To guarantee this, 
objects are designed to have certain sufficient properties in the encoding of 
the state [3, 4, 25], which can be summarised as follows: 

The state space is equipped with an ordering operator, comparing two 

states. 

— The ordering forms a join-semilattice. 

— Each individual operation is an inflation in the semilattice. 

The merge operation, composing states from two replicas, computes the 

least-upper-bound of the given states in the semilattice. 

We present the conditions formally in the extended version[23]. 

An alternative is to make use of the CALM theorem [12]. This allows non- 

monotonic operations, but requires them to coordinate. However, our aim is 

to provide maximum possible availability with SEC. 7 

To ensure these conditions of Strong Eventual Consistency, the tool performs 

the following checks: 

— That each operation is an inflation. In a nutshell, we prove using Boogie 
the following Hoare-logic triple: 


assume ø € Preop 
call Grew t= op(e) 
assert Onew > 70 
— Merge computes the least upper bound. The verification condition dis- 
charged is shown below: 
assume (c,0’) € Prenerge 
call Onew := merge(c,o’) 
assert Onew > 0 N One > 0’ 
assert Vox,ox > o Nok > ol = > o* > Onew 


3. Safety check This stage verifies the safety of the specification as discussed 
in Section 4. This stage is divided further into two sub-stages: 

— Sequential safety: Soteria checks whether each individual operation is 
safe. This corresponds to the conditions (2) and (3) in Figure 5. The 
verification condition discharged by the tool to ensure sequential safety 
of operations is: 


T Convergence of our running example is discussed in the extended version(23]. 
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assume ø € Preəp A Inv 

call Onew := op(c) 

assert Onew €E Inv 
The special case of the merge function is verified with the following 
verification condition: 

assume (0,0’) € Prenerge ^o E Inv A o’ € Inv 

call new := merge(o, o’) 

assert Onew €E Inv 
Notice that in this condition we assume that there are two copies of the 
state, the state of the replica applying the merge, and the state with 
superscript representing a state arriving from another replica. In case of 
failure of the sequential safety check, the designer needs to strengthen 
the precondition of the operation (or merge) which was unsafe. 

— Concurrent safety: Here we check whether each operation upholds the 
precondition of merge. This corresponds to the conditions (5) and (6) in 
Figure 5. Notice that while this check relates to the concurrent behaviour 
of the distributed object, the check itself is completely sequential; it does 
not require reasoning about operations performed by other processes. As 
shown in Section 4, this ensures safety during concurrent operation. 
The verification conditions are: 


assume ø €E Prégp A Inv A (a, a’) € InVeone 
call Onew := Ople) 

+ 
assert (Onew,0 ) € InVeone 


to validate each operation op, and 


assume (0,0’) € InVveone Ao €E Inv A o’ € Inv 

call Onew := merge(a,o’) 

assert (Onew,7) E€ InVeone 
to validate a call to merge. If the concurrent safety check fails, the design 
of the distributed object needs a replicated concurrency control mecha- 
nism embedded as part of the state. 


When all checks are validated, the tool reports that the specification is safe. 
Whenever a check fails, Soteria provides a counterexample ë along with the 
failure message tailored to the type of check. This can help the developer identify 
issues with the specification and fix it. 

Once the invariants and specification of an application is given, Soteria is 
fully automatic, thanks to Z3, an SMT solver that is fully automated. The spec- 
ification of the application includes the state, all the operations including the pre 
and post conditions (including merge). In case the invariant cannot be proven, 
Soteria provides counter-examples. The programmer can leverage these to up- 
date the specification with appropriate concurrency control, rerun Soteria, and 
so on until the application is correct. As far as the proof system is concerned, no 
programmer involvement is required. Currently, the effort of adding the required 
synchronization conditions is manual, but as the next step, we are working on 


8 Soteria uses the counter model provided by Boogie. 
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automating the efficient generation of synchronization control considering the 
workload characteristics. The tool and the full specifications in the form of the 
tool input are available at Soteria [22]. ° 


7 Related Work 


Several works have concentrated on the formalisation and specification of even- 
tually consistent systems [7, 8, 27] to mention but a few. 

A number of works concentrate on the specification and correct implementa- 
tion of replicated data types [10, 14]. Unlike these works, we are not concerned 
with the correctness of the data type implementation with respect to a specifi- 
cation, but rather on proving properties that hold of a distributed object. 

Gotsman et al.[11] present a proof methodology for proving invariants of 
distributed objects. In fact, that work has been extended with a tool called 
CISE [24] which, similar to Soteria, performs the check using an SMT solver as 
a backend. Another more user-friendly tool was developed by Marcelino et al.[19] 
based on the principle of CISE. It is named Correct Eventual Consistency (CEC) 
Tool. The tool is based on Boogie verification framework and also proposes sets 
of tokens that the developer might use. An improved token generation by using 
the counterexamples generated by Boogie is discussed by Nair and Shapiro[20]. 

Unlike our work, CISE and CEC (and more generally the work of Gots- 
man et al.[11]) consider the implementation of operation-based objects. As a 
consequence, they assume that the underlying network model ensures causal 
consistency, and the proof methodology therein presented requires reasoning 
about concurrent behaviours (reflected as stability verification conditions on as- 
sertions). We position Soteria as a complementary tool to CISE, since CISE is 
not well-adapted to reason about systems that propagate state, and Soteria is 
not well-adapted to reason about objects that propagate operations. We con- 
sider, as part of our future work, the use of both CISE and Soteria in tandem 
to prove properties depending on the implementation of the objects at hand. 

Houshmand et al.[13] extends CISE by lowering the causal consistency re- 
quirements and generating concurrency control protocols. It still requires rea- 
soning about concurrent behaviours. 

As anticipated in Section 4, Bailis et al. [2] introduced the concept of I- 
confluence based on a similar system model. J-confluence states that for an 
invariant to hold in a lattice-based state-propagating distributed application, 
the set of reachable valid (i.e. invariant preserving) states must be closed under 
operations and merge. This condition is similar to the ones presented in Figure 5. 
However, there is a fundamental difference: while Bailis et al. [2] recognises that 
one needs to consider only reachable states when checking that the merge opera- 
tion satisfies the invariant, they do not provide means to identify these reachable 
states. This is indeed a hard problem. In Soteria, we instead over-approximate 
the set of reachable states by ignoring whether the states are indeed reachable, 


° Experimental results with verification time is provided in the extended version[23]. 
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but requiring that their merge satisfies the invariant. This is captured in the 
concurrency invariant, InVeonc, which is synthesised from the user provided in- 
variant. How to obtain this invariant is understandably not addressed in Bailis 
et al.[2] since no proof technique is provided. Notice that this is a sound approxi- 
mation since it guarantees the invariant is satisfied, and we also verify that every 
operation preserves this condition as shown in Corollary 1. In this sense we say 
that the pre-condition of merge for a given invariant I, is also an invariant of the 
system. It is this abstraction step that makes the analysis performed by Soteria 
to be syntax-driven, automated, and machine-checked. The fact that Soteria is 
an analysis of a program is in contrast with I-confluence [2] where no means 
to link a given program text to the semantical model, let alone rules to show 
that the syntax implies invariant preservation, are provided. In other words, I- 
confluence [2] does not provide a program logic, but rather a meta-theoretical 
proof about lattice-based state-propagating systems. 

Our previous work [21], provides an informal proof methodology for ensuring 
safety of Convergent Replicated Data Types(CvRDTs), which are a group of 
specialised data structures used to ensure convergence in distributed program- 
ming. This work builds upon it, and formalises the proof rules and prove them 
sound. We relax the requirement of CvRDTs by allowing the usage of any data 
types, that together respect the lattice conditions mentioned in Section 3. We 
also show several case studies which demonstrate the use of the rule. 

A final interesting remark is that we can show how our methodology can 
aid in the verification of distributed objects mediated by concurrency control. 
Some works [16, 17, 26, 27] have considered this problem from the standpoint of 
synthesis, or from the point of view of which mechanisms can be used to check 
a certain property of the system. 


8 Conclusion 


We have presented a sound proof rule to verify invariants of state-based dis- 
tributed objects, i.e., the objects that propagate state. We present the proof 
obligations guaranteeing that the implementation is safe in concurrent execution 
by reducing the problem to checking that each operation of the object satisfies 
a precondition of the merge function of the state. 

We presented Soteria, a tool sitting on top of the Boogie verification frame- 
work. This tool can be used to identify the concurrency bugs in the design of 
a distributed object. Soteria also checks convergence by checking the lattice 
conditions on the state, described by [3]. We have shown multiple compelling 
case-studies showing how Soteria can be leveraged to ensure the correctness of 
distributed objects that propagate state. It would be an interesting next step 
to look into automatic concurrency control synthesis. The synthesised concur- 
rency control can be analysed and adapted dynamically to minimise the cost of 
synchronisation. 
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Abstract. Program sketching is a program synthesis paradigm in which 
the programmer provides a partial program with holes and assertions. 
The goal of the synthesizer is to automatically find integer values for 
the holes so that the resulting program satisfies the assertions. The most 
popular sketching tool, SKETCH, can efficiently solve complex program 
sketches, but uses an integer encoding that often performs poorly if the 
sketched program manipulates large integer values. In this paper, we 
propose a new solving technique that allows SKETCH to handle large in- 
teger values while retaining its integer encoding. Our technique uses a 
result from number theory, the Chinese Remainder Theorem, to rewrite 
program sketches to only track the remainders of certain variable values 
with respect to several prime numbers. We prove that our transformation 
is sound and the encoding of the resulting programs are exponentially 
more succinct than existing SKETCH encodings. We evaluate our tech- 
nique on a variety of benchmarks manipulating large integer values. Our 
technique provides speedups against both existing SKETCH solvers and 
can solve benchmarks that existing SKETCH solvers cannot handle. 


1 Introduction 


Program synthesis, the art of automatically generating programs that meet a 
user’s intent, promises to increase the productivity of programmers by automat- 
ing tedious, error-prone, and time-consuming tasks. Syntax-guided Synthesis 
(SyGuS) [2], where the search space of possible programs is defined using a gram- 
mar or a domain-specific language, has emerged as a common program synthesis 
paradigm for many synthesis domains. One of the earliest and successful syntax- 
guided program synthesis frameworks is program sketching [19], where (i) the 
search space of the synthesis problem is described using a partial program in 
which certain integer constants are left unspecified (represented as holes), and 
(ii) the specification is provided as a set of assertions describing the intended be- 
havior of the program. The goal of the synthesizer is to automatically replace the 
holes in the program with integer values so that the resulting complete program 
satisfies all the assertions. Thanks to its simplicity, program sketching has found 
wide adoption in applications such as data-structure design [20], personalized 
education [18], program repair [7], and many others. 


© The Author(s) 2020 
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The most popular sketching tool, SKETCH [21], can efficiently solve complex 
program sketches with hundreds of lines of code. However, SKETCH often per- 
forms poorly if the sketched program manipulates large integer values. SKETCH’s 
synthesis is based on an algorithm called countererample-guided inductive syn- 
thesis (CEGIS) [21]. The CrGIs algorithm iteratively considers a finite set I of 
inputs for the program and performs SAT queries to identify values for the holes 
so that the resulting program satisfies all the assertions for the inputs in Z. 
Further SAT queries are then used to verify whether the generated solution is 
correct on all the possible inputs of the program. SKETCH represents integers 
using a unary encoding (a variable for each integer value) so that arithmetic 
computations such as addition, multiplication etc. can be represented efficiently 
in the SAT formulas as lookup operations. This unary encoding, however, results 
in huge formulas for solving sketches with larger integer values as we also observe 
in our evaluation. Recently, an SMT-like technique that extends the SAT solver 
with native integer variables and integer constraints was proposed to alleviate 
this issue in SKETCH. It guesses values for the integer variables and propagates 
them through the integer constraints, and learns from conflict clauses. However, 
this technique does not scale well when the sketches contain complex arithmetic 
operations—e.g., non-linear integer arithmetic. 


In this paper, we propose a program transformation technique that allows 
SKETCH to solve program sketches involving large integer values while retain- 
ing the unary encoding used by the traditional SKETCH solver. Our technique 
rewrites a SKETCH program into an equivalent one that performs computations 
over smaller values. The technique is based on the well-known Chinese Remain- 
der Theorem, which states that, given distinct prime numbers p1,..., pn such 
that N = p,-...+ Pn, for every two distinct numbers 0 < kı,k2 < N, there 
exists a p; such that kı mod p; Æ kz mod pj. Intuitively, this theorem states that 
tracking the modular values of a number smaller than N for each p; is enough to 
uniquely recover the actual value of the number itself. We use this idea to replace 


a variable x in the program with n variables rp,,...,Up,, SO that for every i, 
Lp, = x mod p;. Using closure properties of modular arithmetic we show that, 
as long as the program uses the operators +,—,*,==, tracking the modular 


values of variables and performing the corresponding operations on such values 
is enough to ensure correctness. For example, to reflect the variable assignment 
x = y+ z, we perform the assignment £p, = (Yp; +Zp,) mod p;, for every pi. Sim- 
ilarly, the Boolean operation x == y will only hold if £p, = yp,, for every p;. To 
identify what variables and values in the program can be rewritten, we develop 
a data-flow analysis that computes what variables may flow into operations that 
are not sound in modular arithmetic—e.g., <, >, <, and /. 


We provide a comprehensive theoretical analysis of the complexity of the 
proposed transformation. First, we derive how many prime numbers are needed 
to track values in a certain integer range. Second, we analyze the number of bits 
required to encode values in the original and rewritten program and show that, 
for the unary encoding used by SKETCH, our technique offers an exponential 
saving in the number of required bits. 
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We evaluate our technique on 181 benchmarks from various applications of 
program sketching. Our results show that our technique results in significant 
speedups over existing SKETCH solvers and is able to solve 48 benchmarks on 
which SKETCH times out. 


Contributions. In summary, our contributions are: 


— A language IMP-MOD together with a modular semantics that represents in- 
teger values using their remainders for a given set of primes and a proof that 
this semantics is equivalent to the standard integer semantics (§ 4). 

— A data-flow analysis for detecting variables that can be soundly executed in 
the modular semantics and an algorithm for translating IMP programs into 
IMP-MOD ones (§ 5). 

— A synthesis algorithm for IMP-MOD programs and incremental synthesis al- 
gorithm that lazily increases the number of primes used in the modular 
semantics (§ 6). 

— A complexity analysis that shows that synthesis for IMP-MOD programs re- 
quires exponentially smaller SAT queries than synthesis in IMP (§ 7). 

— An evaluation of our technique on 181 benchmarks that manipulate large 
integer values. Our solver outperforms the default SKETCH unary solver, it 
can solve 48 new benchmarks that no SKETCH solver can solve, and is 15.9X 
faster than the SKETCH SMT-like integer solver on the hard benchmarks 
that take more than 10 seconds to solve (§ 8). 


An extended version containing all proofs and further details has been uploaded 
to arXiv as supplementary material. 


2 Motivating Example 


In this section, we use a simple example to illustrate our technique and its 
effectiveness. Consider the SKETCH program polyArray presented in Figure 1b. 
The goal of this synthesis problem is to synthesize a two-variable quadratic 
polynomial (lines 7-8) whose evaluation p on given inputs x and y is equal to a 
given expected-output array z (line 9). Solving the problem amounts to finding 
non-negative integer values for the holes (??) and sign values, i.e., -1 or 1, for 
the holes (?7°) such that the assertion becomes true. In this case, a possible 
solution is the polynomial: 


p[i] = -17*y[il]72-8*x[i]*y[i]-17*x[i]72-3*x[il; 


When attempting to solve this problem, the SKETCH synthesizer times out at 
300 seconds. To solve this problem, SKETCH creates SAT queries where the 
variables are the holes. Due to the large numbers involved in the computation of 
this program, the unary encoding of SKETCH ends up with SAT formulas with 
approximately 45 million clauses. 


1 In SKETCH, holes can only assume positive values. This is why we need the sign holes, 
which are implemented using regular holes as follows: if (7?) then 1 else -1. 
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1 // n=4, x=[24,-1,0,-19], y=[-7,11,-3,13] 

2 // z=[-9353,-1983,-153, -6977] 

3 polyArray(int n, int[n] x, int[n] y, int[n] z){ 
4 int [n] p; 

5 int i=0; 

6 while (i<n){ 
7 plil]=?7į*??1*y[i]?+?75*??2*x[i]?+?73*??3x*x[i]*y[i] 
8 +7?73*??4*y[i]+?75*??5*x[i]+??8*??6; 


9 assert pli] == z[i]; 
10 i++; } 
11 } 


(a) Original sketch program. 


1 // n=4, x=[24,-1,0,-19], y=[-7,11,-3,13] 
2 // z=[-9353, -1983, -153, -6977] 
3 pAPrime(int n, int[n] x, int[n] y, int[n] z){ 


4 int [n] x2,x3,x5,x7,x11,x13,x17; 

5 while (i<n){ // Initialize modular variables 
6 x2[i]=x[i]%2; 

7 x3[i]=x[i]%3; 

8 aaa i 


9 int i=0; 
10 int [n] p2,p3,p5,p7,p11,p13,p17; 
11 while (i<n){ 


12 p2[iJ=(777*(771%2) * (y2[4]°%2) %2 
13 +275 % (72 0%2) *(x20i]°%2) 42 
14 +? 73% (773%2) *(x201]%2)*(Cy2[i1%2) 42 
15 +?774* (77442) *(y2[i]%2) %2 
16 +2775 *(??5%2)*(x2[i]%2)%2 
17 +? 76% (P2642) 42) 423 

18 eh 

19 assert p2[i] = z2[il; 

20 assert p3[i] = z3[il; 

21 a 

22 i++; } 

23 } 


(b) Rewritten sketch program. 


Fig. 1: SKETCH program (a) and rewritten version with values tracked for differ- 
ent moduli (b). 


Sketch Program with Modular Arithmetic The technique we propose in this paper 
has the goal of reducing the complexity of the synthesis problem by transforming 
the program into an equivalent one that manipulates smaller integer values and 
that yields easier SAT queries. Given the SKETCH program in Figure 1b, our 
technique produces the modified SKETCH program pAPrime in Figure la. The 
new SKETCH program has the same control flow graph as the original one, but 
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instead of computing the actual values of the expressions x[-] and y[-], it tracks 
their remainders for the set of prime numbers {2,3,5,7,11,13,17} using new 
variables—e.g., x2[i] tracks the remainder of x[i] modulo 2. 

The program pAPrime initializes the modular variables with the correspond- 
ing modular values (lines 5-8). When rewriting a computation over modular 
variables, the same computation is performed modularly (lines 12-17). For ex- 


ample, the term ??{ * ??1*y [i] ? when tracked modulo 2 is rewritten as 


(277% (77142) *((y204]%2)°%2))%2 


In the rewritten program, the variables i and n are not tracked modularly, 
since such a transformation would incorrectly access array indices. Finally, the 
assertions for different moduli share the same holes as the solution to the SKETCH 
has to be correct for all modular values. In the rest of the paper, we develop a 
data flow analysis that detects when variables can be tracked modularly. 

SKETCH can solve the rewritten program in less than 2 seconds and produce 
hole values that are correct solutions for the original program. This speedup 
is due to the small integer values manipulated by the modular computations. 
In fact, the intermediate SAT formulas generated by SKETCH for the program 
pAPrime have approximately 120 thousand clauses instead of the 45 million 
clauses for polyArray. Due to the complex arithmetic in the formulas, even if 
SKETCH uses the SMT-like native integer encoding, it still requires more than 
300 seconds to solve this problem. 

While this technique is quite powerful, it does have some limitations. In 
particular, the solution to the rewritten SKETCH is guaranteed to be a correct 
solution only for inputs that cause intermediate values of the program to be in 
a range |d1, d2] such that d2 — dı <2*3x5x 7x 11x 13 x 17 = 510,510. We 
will prove this result in Section 4. 


3 Preliminaries 


In this section, we describe the IMP language that we will consider through- 
out the paper and briefly recall the counter-example guided inductive synthesis 
algorithm employed by the SKETCH solver. 

For simplicity, we consider a simple imperative language IMP with integer 
holes for defining the hypothesis space of programs. The syntax and semantics 
of IMP are shown in Appendix ??. Without loss of generality, we assume the 
programs consists of a single program f (v1,--- ,Un,??1,.--??m) with n integer 
variables and m integer holes. The body of the program f consists of a sequence 
of statements, where a statement s can either be a variable assignment, a while 
loop statement, an if conditional statement, or an assert statement. The holes 
7? denote integer constant values that are unknown and the goal of the synthesis 
process is to compute these values such that a set of desired program assertions 
are satisfied for every possible input values to f.? 


? Our implementation also supports for-loops, recursion, arrays, and complex types. 
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Example 1. An example IMP sketch denoting a partial program is shown below. 
triple(n,h,??){ h=??; assert h*n==ntntn; } 


The goal of the synthesizer is to compute the value of the hole ?? such that the 
assertion is true for all possible input values of n and h. For this example, ?? = 3 
is a valid solution. 


The SKETCH solver uses the counter-example guided inductive synthesis al- 
gorithm (CEGIS) to find hole values such that the desired assertions hold for all 
input values. Formally, the SKETCH synthesizer solves the following constraint: 


A?? = (771,--+, ??m)EZ™. Vine. [f Gin, 27)" AL 


where Z denotes the domain of all integer values, ?? denotes the list of unknown 
hole values (?71,--: ,??m) € Z”, T denotes the domain of all input argument 
values to the function f, and [f Cin, 77) ]"™? A L denotes that the program satis- 
fies all assertions. The synthesis problem is in general undecidable for a language 
with complex operations such as the IMP language because of the infinite size of 
possible hole and input values. To make the synthesis process more tractable, 
SKETCH imposes a bound on the sizes of both the input domain (Zp) and the 
domain of hole values (Z,) to obtain the following constraint: 


3?? = (771, +++, ??m)EZM. Viney. [f Cin, 27) ]" AL 


The bounded domains make the synthesis problem decidable, but the second- 
order quantified formula results in a search space of hole values that is still huge 
for any reasonable bounds. To solve such bounded equations efficiently, SKETCH 
uses the CEGIS algorithm to incrementally add inputs from the domain until 
obtaining hole values ?? that satisfy the assertion predicates for all the input 
values in the bounded domain. The algorithm solves the second-order formula 
by iteratively solving a series of first-order queries. It first encodes the existential 
query (synthesis query) over a randomly selected input value ino to find the hole 
values H that satisfy the predicate for ing using a SAT solver in the backend. 


t= (??1, a TIm) € Zp. [f Cino, ??) | haa x L 


It then encodes another existential query (verification) to now find a counter- 
example in; for which the predicate is not satisfied for the previously found hole 
values. 


Jin € Ty. af Gin, EY" AL 


If no counter-example input can be found, the hole values are returned as the de- 
sired solution. Otherwise, the algorithm computes a new hole value that satisfies 
the assertion for all the counter-example inputs found so far. This process contin- 
ues iteratively until either a desired hole value is found (i.e. no counter-example 
input exists), no satisfiable hole value is found (i.e. the synthesis problem is 
infeasible), or the SAT solver times out. 
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Integer Encoding The SKETCH solver can efficiently solve the synthesis con- 
straint in many domains, but it does not scale well for sketches manipulating 
large numbers. SKETCH uses a unary encoding to represent integers, where the 
encoded formula consists of a variable for each integer value. The unary encod- 
ing allows for simplifying the representation of complex non-linear arithmetic 
operations. For example, a multiplication operation can be represented as sim- 
ply a lookup table using this encoding. In practice, the unary encoding results 
in magnitudes of faster solving times compared to the logarithmic encoding for 
many synthesis problems. However, this also results in huge SAT formulas in 
presence of large integers. Recently, a new SMT-like technique based on extend- 
ing the SAT solver with native integer variables and constraints was proposed to 
alleviate this issue in SKETCH. Similar to the Boolean variables, this extended 
solver guesses for integer values and propagates them in the constraints while 
also learning from conflict clauses. Note that SKETCH uses these SAT extensions 
and encodings instead of an SMT solver as SMT doesn’t scale well for the non- 
linear constraints typically found in the synthesis problems. Our new technique 
for handling computations over large numbers still maintains the efficient unary 
encoding of integers and computations over them. 


4 Modular Arithmetic Semantics 


In this section, we present the language IMP-MOD in which variables can be 
tracked using modular arithmetic. We start by recalling the Chinese Remain- 
der Theorem, then define both a modular and integer semantics for the IMP-MOD 
language, and show that the two semantics are equivalent. 


4.1 The Chinese Remainder Theorem 


The Chinese Remainder Theorem is a powerful number theory result that shows 


the following: given a set of distinct primes P = {p1,..., Pk}, any number n in 
an interval of size p,-...- px can be uniquely identified from the remainders 
[n mod pı,::- ,n mod px]. In Section 4.2, we will use this idea to define the 


semantics of the IMP-MOD language. The main benefit of this idea is that the 
remainders could be much smaller than actual program values. 


Example 2. For P = [3,5,7] and an integer 101, its remainders [2, 1,3] are much 
smaller than 101. However, any number of the form 101 + 105 x n also has 
remainders [2, 1,3] with respect to the same prime set. 


In general, one cannot uniquely determine an arbitrary integer value from its 
remainders for some set P—i.e., the mapping from a number to its remainders 
is an abstraction in the sense of abstract interpretation [6]. However, if we are 
interested in a limited range of integer values [L, U), one can choose a set of 
primes P = {p,,...,p,} such that, for values L < a < U, the map [r1,--- , rz] => 
x, where x = r; mod pj, is an injection. 
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Modular Expr a’ :=c" | v” | aj op, az | TOPRIME(a) 


Modular Op op, =+| -|*= 


Arith Expr a := ?? | c | v | ai opa a2 
Arith Op opa := +| —- | * |/ 
P 


Bool Expr b := not b | aı ope az | bı and bə | bı or bə | af ==} 


Comp Op ope :=< | > | < | 2 


Stmt s := v = a | v” =a’ | $13 82 
| while(b) {s} | if(b) sı else s2 | assert b 


P P 
Program P := f (v1, ,Un, U1; ttt Um, ??1,-..3?71) {8} 


Fig. 2: Syntax of the IMP-MOD language. 


Theorem 1 (Chinese Remainder Theorem [4]). Let pı, ..., pk be positive 
integers that are pairwise co-prime—i.e., no two numbers share a divisor larger 
than 1. Denote N = Ma pi, and let d, rı, T2, ..., rk be any integers. Then 
there is one and only one integer d < x < d+ N such that x = r; mod p; for 
everyl<i<k. 


We define the translation function mp(x) := |x mod p;,---,x£ mod px] that 
maps an integer to its set of remainders with respect to P. When mp(x) is 
bijective on some set R, we denote with jig : [0,p1) x +--+ x [0, pk) > R its 
inverse function. 


Example 3. Let x be a integer in the range [0, 105) (note that 105 = 3 x 5 x 7). 
If we know that the value of x is congruent to [2, 1,3] modulo {3,5,7}, we can 
uniquely identify the value of x to be 101 by observing that 101 = 2 mod 3, 101 = 
1 mod 5, and 101 = 3 mod 7. 


The following lemma shows that the function mp is closed under addition, 
subtraction and multiplication of integers. 


Lemma 1. For every set of primes P, integers x and y, and op € {+,—,*}, the 
following holds: mp(x op y) = mp(a) op mp(y). 


4.2 The IMP-MOD Language 


In this section, we define the IMP-MOD language (syntax in Figure 2), a variant 
of the IMP language for which the semantics can be defined using modular arith- 
metic.” An IMP-MOD program is parametric on a set P = {p1,..., px} of distinct 


3 We consider the simple subset for a clear presentation of the semantics, but our 
framework works for the full IMP language (and for more complex language con- 
structs) as we will see in the later sections. 
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[roPRime(a)]> o> := E fali „e mod pi,- ] 
WI o := o" (v) Coal := [ c mod pı,- +- ,c mod px ] 
lai opf aalo o? := [ (at opf x?) mod pi,--- ] where [af]? = [ zi,- , 2h] 
[ai== azli ge <= at==a2i A- Agh == x? where [ai]? = L ri, -x4 ] 
IG :=c [olio := o (v) [a1 opa azb o? = larl, o OPa CAP 
[v = ali or = (olv H falb el, o) [uP = alf o = (0, o” [u 4 [alh eel) 


Fig. 3: Modular semantics. 


prime numbers. The structure of an IMP-MOD program is similar to an IMP pro- 
gram, but IMP-MOD supports two types of variables and arithmetic expressions: 
the regular IMP ones (i.e., v, a, and b), which operate over an integer semantics, 
and the modular ones (i.e., v”, a®, and bP), which take as an additional parame- 
ter the set of primes P and operate over a modular semantics. The semantics of 
some of the key constructs of IMP-MOD is shown in Figure 3. 

The key idea of the modular semantics is that the value of each program 
variable in v” and arithmetic expressions in a’ is denoted by a tuple of val- 
ues, one for each prime number p; € P. For example, the value of the con- 
stant œ is represented by the tuple [c mod p,,--- ,c mod px], where each in- 
dividual value denotes the remainder of c when divided by the prime number 
pi € P. Formally, the program f has two sets of variables V7 = {v1,--+ , Un} 
and V? = {v{,--- ‚vE }, which contain all the integer and prime variables re- 
spectively, and a set of holes H = {?71,...,??,}. The denotation function, uses 
two valuation functions: (i) o : V? U H — Z, which maps variables and holes 
to integer values, (ii) oP : VP — [0, p1) x --» x [0, px), which maps primed vari- 
ables to modular values. The expression TOPRIME(a) converts the integer value 
of an integer expression a to a modular tuple. Arithmetic expressions in a’ are 
computed using modular values with the result being obtained using modular 
arithmetic with respect to the corresponding primes in P. Note that the only 
comparison operator allowed over modular expressions is == and that the divi- 
sion operator cannot be applied to modular expressions. While the syntax does 
not directly allow for holes to be represented modularly—i.e., we do not have 
expressions of the form ??’—an expression of the form TOPRIME(??) effectively 
achieves the objective of representing a hole ?? modularly. 


4.3 Equivalence between the two Semantics 


Next, we provide an alternative integer semantics, which applies the IMP integer 
semantics to modular expressions and show that, under some assumptions on 
the values manipulated by the program, the modular and integer semantics are 
equivalent. We will use this result to build our modified synthesis algorithm. 
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[TOPRIME(@)]o1,02 = [a]o4,c2 Ila, = oo(v') [c*Jo1,02 := C 
[ai Opa a2] o1,02 = [at]o1,02 OPa [azloi,o2 [at==a3] 01,02 = [et]: ,02==[43] 01,02 
leloro = c lvloi o2 := o1 (v) [a1 opa a2ļoi,o2 := [ar]oi oz Opa [a2] 01,05 
[v = aļoi,oz = (oiv © faloi,o2], 02) [v" = a: Jeres = (01, 02[v" 4 [2"]o1,02]) 


Fig. 4: Integer semantics. 


Integer Semantics The integer semantics of IMP-MOD is shown in Figure 4 (de- 
noted [-]<¢,,c.). In this semantics, modular expressions are evaluated as integer 
expressions using the same semantics as for IMP—i.e., the values of modular vari- 
ables and modular arithmetic expressions are denoted by integer values. There- 
fore, in the integer semantics, we use two valuation functions 0, : VŽUH > Z 
mapping variables and holes to integers and c2 : V?” ++ Z mapping modular 
variables to integers. 


Relation between the Two Semantics We now show that the modular semantics 
is, in some sense, equivalent to the integer semantics. For the rest of this section, 
we fix a set of distinct primes P = {pj,--- , px}. 

To prove the equivalence of the two program semantics, we will require the 
values of modular expressions to lie in some range that is covered by the prime 
numbers in P. The following definition captures this restriction. 


Definition 1. Given a modular arithmetic expression a" (resp. Boolean expres- 
sion b) and some integers L < U, we say aè with context (c1, o2) is uniformly in 
the range R := [L,U) —a” €,,.¢, R for short—if under the integer semantics, 
all evaluation of modular subexpressions of aè (resp. b) are in the range R: 


— d Eoo R, iff [alo oz € R; 

B ay == as €o1,02 R, iff ay €o1,09 R, ay Eo, ,00 R; 
= by and bə Esi ,o2 R, iff by Eo 1,00 R, bə Esi ,o2 R; 
— bı or bə Eo4,09 R, iff by €o1,09 R, bo €Eo1,09 R; 

= not b€g, t iff b €o,,05 R; 


— a Ope 42 Eo,,0, R for any arithmetic expressions a1, az and operator ope. 


Given a valuation function o : V?” + Z, we write mp oo to denote the 
modular valuation obtained by applying the mp function to o—i.e., for every 
u’ € V”, (mp oa)(v") = mp(o(v")). Similarly, for a modular valuation function 
a? : VE — [0, p1) x «++ [0, pk), we denote mp” o g? the integer valuation from 
V” to R such that, for every v” € V”, Gag oa?) (v?) = mp "(oP (vP)). The 
following lemma shows that, when the values of modular arithmetic expressions 
lay in an interval of size N = p,-...- Pk the modular and integer semantics of 
modular arithmetic expressions are equivalent. 
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Lemma 2. Given a set of primes P = {pi,--- , pe}, an arithmetic expression 


a’, and two valuation functions cı : VŽU H > Z and o2 : V” > Z, we have 


mep([@"]Jo1,02) = Ja” ee 


Moreover, if there exists an interval R of size N = p,-...-+ pp such that 
a? Eci o, R, then 
—1,R (p PIP P 
Mp (la Sain = [a Joias: 


Similarly, we show that the two semantics are also equivalent for Boolean 
expressions. 


Lemma 3. Given a set of primes P = {p1,--- pk}, an interval R of size N = 
pı:-.-- Pp, a Boolean expression b, and two valuation functions o1 : VŽU H > Z 
and oz : V? > Z, if b Eo,,0, R, then [bloo = [olt mses 


We are now ready to show the equivalence between the modular semantics 
and the integer semantics for programs P € IMP-MOD. The semantics of a pro- 
gram P = f (VZ, V?, H) {s} is a map from valuations to valuations, i.e., given 
a valuation c1 : VŽ > Z for integer variables, a valuation o2 : V? + Z for mod- 
ular variables and a valuation o” : H — Z for holes, we have |P] (01, 02,0”) = 
[s]o,Ue#,o, and [P]? (01, 02,0”) = [lE uor mooz Therefore, it is sufficient to 
show that the two semantics are equivalent for any statement s. 

The two semantics are equivalent for a statement s if, under the same input 
valuations, the resulting valuations of the semantics can be translated to each 
other. Formally, given valuations c1, o2 and an interval R of size N, we say 
[slo:,co =P [s] 5, mooz iff of = of, mp o o, = o} and of = mp" oo} where 


= / / P = 1 P 
[s 01,02 — (01,03) and llers = (01,0 ). 
We define uniform inclusion for statements. 


Definition 2. Given a set of primes P, two integers L < U and a statement s, 
we say s with context (01,02) is uniformly in the range R := [L,U)—s Eo, o, R 
for short—if under the integer semantics, all evaluation of modular subexpres- 
sions of s are in the range R: 


— (v = aP) Eoi o2 R if a Eoo R- 

white(b){s} Eoo R iff § Eoi o, R and b Eo, o, R. 

— 81382 Eeiz R iff S1 Eoy,02 R and sə Eo 1,09 R. 

if(b) sı else s2 Eo,.0, R iff $1 €o1,0, R, $2 €o,,0, R and b Eo, o, R. 
— assert b Ec, o R iff b €o,,0, R. 


At last, the two semantics are equivalent for statements. 


Theorem 2. Given a set of primes P = [pi,--- , pr], a statement s and two 
valuation functions cı : VŽU H > Z and on: V? > Z, if there exists an interval 
R of size N such that s Es, o, R, then [s]o,,0, =p [s]5 


01,Mpoag* 
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Algorithm 1: returns variables that should be tracked using modular /in- 
teger semantics. 


/* f: sketched function, V” variables to be tracked modularly, V” 
variables to be tracked with integer values */ 
1 function DataFlowAnalysis(f) 
S+ {/<,>, <, >}; V70 
3 for op € S do 
/* Compute all variables v that may flow into op */ 
| V” + V” UDataflow(op, f) 


VPevV\vV2 
return (V“,V") 


5 From IMP to IMP-MOD Programs 


In this section, we develop a data flow analysis for detecting variables in IMP 
programs for which it is sound to track values modularly. We then use this data 
flow analysis to rewrite an IMP program to an equivalent IMP-MOD program. 


5.1 Data Flow Analysis 


The formalization of IMP-MOD in Section 4.2 made it clear that the modular 
semantics is only appropriate when integer values are manipulated using addi- 
tion, multiplication, subtraction, and equality. Other operations like division and 
less-than comparison cannot be computed soundly in modular arithmetic. 


Example 4. Consider an integer variable x with modular value z2 under modulus 
2 and x3 under modulus 3, and an integer variable y with modular value y2, 
y3 under corresponding moduli. Then the assignment of x = y + y; implies 
£2 = (yo + y2) mod 2; and z3 = (y3 + y3) mod 3. However, x = x/y; does not 
imply x2 = (%2/y2) mod 2; and z3 = (#3/y3) mod 3. 


We now define a data flow analysis (shown in Algorithm 1) for computing 
which variables in a program must be tracked with the integer semantics (i.e., the 
set VŽ) and which variables can be soundly tracked using the modular semantics 
(i.e., the set V”). For each operator op in {/,<,>,<,>}, the analysis computes 
the set of variables that may flow into the operands of an expression of the form 
€1 Op e2. In practice, this is done via backward may analysis, noted as Dataflow 
procedure in Algorithm 1. The obtained set of variables must be tracked using 
the integer semantics. The remaining variables will never flow into a problematic 
operator and can therefore be tracked using the modular semantics. 
Implementation Remark Since our implementation also supports arrays and re- 
cursion, the data flow analysis in Algorithm 1 is inter-procedural and the set S 
also contains the array indexing operator | ]—i.e., given an expression arr[a], if 
a variable v may flow into a, then a must be tracked using the integer semantics. 
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v” if a = v and v € V” 
P Si 
OL ae ae 
a(a1) op, Ra(az2) ifa=ai op, a2 
TOPRIME(a) otherwise 
Ra(ai1) == Ra(a2) if b= ai == a2 
R,(b1) and Ry(bz) if b= by and bz 
Ro(b) = 
not Rz(b:i) if b = not bo 
b otherwise 
Rs(s1); Rs (s2) if s = s1; 82 
v =a if s = v = a and v € V” 
Res v” = Rala) if s = v = a and v € V” 
if(Rz(b)) Rs(so) else Rs(sı) ifs =if(b) sọ else sı 
while(R,(b)) {Rs(s)} if s = while b {s} 
assert R,(b) if s = assert b 


Fig. 5: Subset of rules for the translation from IMP to IMP-MOD programs. Rules 
are parametric in V, V” with P: Re(f(V,?7){s}) = f(V“, V", 27){R(s)}. 


Furthermore, while in our formalization we allow variables to be tracked using 
only one of the two semantics, in our implementation, we allow variables to be 
tracked differently (using actual values or modular values) at different program 
points by tracking, for each variable v, the program points for which the actual 
value of v is needed, which is done by using the same data-flow analysis. In this 
case, a variable might initially need to be tracked using actual values but can 
later be tracked using modular values. 


Example 5. Consider the sketch program polyArray in Figure 1b. For this pro- 
gram, Algorithm 1 will return that the variables x and y can be tracked modu- 
larly. However, the variables i and n must be tracked using the integer semantics 
since they are used in a < operation and as array indices. 


5.2 From IMP to IMP-MOD 


Now that we have computed what sets of variables can be tracked modularly, we 
can transform the IMP program into an IMP-MOD program. The transformation 
Ry that rewrites f into an IMP-MOD program is shown in Figure 5. The key idea 
of the program transformation is to use the sets VŽ and V” to only rewrite 
variables and sub-expressions of f for which the modular arithmetic can be 
performed soundly. 

Once we get a solution for the IMP-MOD program as hole values, we can get 
a solution for the IMP program by mapping the hole to integer values given by 
the integer semantics. 
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Example 6. Consider a program where the dataflow analysis computes V” = 
{i,n} and V? = {x}. The statement x = x +i+ 1 is rewritten to x? = x” + 
TOPRIME(i) + 1°. 


The transformation Ry is sound. 


Theorem 3. Given an IMP program f, and sets V? and V” resulting from the 
data flow analysis on f, the program Ry(f) is in the IMP-MOD language. More- 


over, [F]? = [Ry (A). 


6 Solving IMP-MOD Sketches 


In this section, we discuss how synthesis in the modular semantics relates to syn- 
thesis in the integer semantics and provide an incremental algorithm for solving 
IMP-MOD sketches. 


6.1 Synthesis in IMP-MOD 


Given a set of integers R we say that a variable valuation ø is in R (denoted 
o € R) if for every v, we have o(v) € R. Similarly to what we saw in Sec- 
tion 3, we assume that the sketch has to be solved for finite ranges of possible 
values for the hole (Ry) and input values (Rin). Solving an IMP-MOD problem 
P = f(V,V",H){s} for the integer semantics amounts to solving the following 
constraint: 


oF E€ Ry Voi, 02 € Rin-[s]oiuct sos Æ di 


WwW 


According to Theorem. 2, given a set of distinct primes P = {py,--- , pz} 
and variable valuations o” ,c1, and oo, if there exists a range R of size N = 
pi: ...+++ pr such that s €5,Uc#,¢, R, the modular semantics and the integer 
semantics are equivalent to each other. Using this observation, we can define 
the set of variable valuations for which the two semantics are guaranteed to be 
equivalent: 


T= {(01,02) | Vo" ERg AR. |RI=N A 8€5,U0%,0,R} - 


Since for every o” € Ry and 01,02 € T we have that [sli uot mpoo = 


[s]o:Uc#,o2, any solution to an IMP-MOD program in the modular semantics is 
also a solution to the following formula in the integer semantics: 


Jo” € Ry No, 02 € TR- lsloruot o A L- 


When all valuations in 01,02 € Rin are also elements of Th, any solution to 
an IMP-MOD program in the modular semantics is guaranteed to be a correct 
solution under the integer semantics. 

To summarize, if the synthesizer returns UNSAT for the IMP-MOD program, 
the problem is unrealizable and does not admit a solution. When it returns a solu- 
tion, the solution is correct if it only produces valuations in the range allowed by 


586 R. Pan et al. 


Algorithm 2: Incremental synthesis for IMP-MOD. 


/* f: function, P: set of primes */ 
1 function IncrementalSynthesis(/f,P) 

2 P’ + [pi] 

3 feyn + Synthesis(f, P’) 

4 while Speen E P: aVerify(fsyn, Pees) do 

5 

6 

rá 


P’ «+ P’ U Peex 
feyn + Synthesis(f, P’) 
if fsyn == UNSAT then return Í ; 


8 return f syn 


the choice of prime numbers. In practice, one can use a verifier to check the cor- 
rectness of the synthesized solution and add more prime numbers to the modular 
synthesizer if needed. In fact, this is the main idea behind the counterexample- 
guided inductive synthesis algorithm used by SKETCH (Section 3). 


6.2 Incremental Synthesis Algorithm 


In this section, we propose an incremental synthesis algorithm that builds on 
the following observation. The set of variable valuations for which modular and 
integer semantics are equivalent increases monotonically in the size of P: 


Pi € Ps = TH C Tra, (1) 


Algorithm 2 uses Equation 1 to add prime numbers lazily during the synthesis 
process. The algorithm first constructs a set P’ = {p; } with the first prime num- 
ber pı € P and synthesizes a solution that is correct for computations modulo 
the set P’. It then checks if the synthesized solution fsyn satisfies the assertions 
with respect to all prime numbers in P. If yes, fsyn is returned as the solution. 
Otherwise, the algorithm finds a prime pce, € P where Verify(fsyn, Pcex) does 
not hold and it adds it to the set P’ continuing the iterative algorithm. Due to 
Equation 1, Algorithm 2 is sound and complete with respect to the synthesis 
algorithm that considers the full prime set P all at once. 

In practice, the user could use domain knowledge to estimate a suitable set 
of primes or alternatively use our incremental algorithm to discover appropriate 
prime sets. The set of prime numbers {2, 3, 5,7,11, 13,17} could usually instan- 
tiate a range R that is large enough for most synthesis tasks based on SKETCH. 


7 Complexity of Rewritten Programs 


In this section, we analyze how many bits are necessary to encode numbers for 
both semantics using unary and binary bit-vector encodings of integers (Sec. 7.1 
and 7.2), and show how many prime numbers are necessary in the modular 
semantics to cover values up to a certain bound (Sec. 7.3). The following results 
build upon several number theory results that the reader can consult at [9, 15]. 
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7.1 Bit-complexity of Binary Encoding 


In this section, we analyze how many bits are necessary when representing an 
interval of size N in binary in our modular semantics. In the rest of the section, 
we consider the set of primes Pa = {p | p < n} = {pi,..., pk} containing the 
prime numbers that have value smaller than n. We will show in Section 8 that 
this choice of prime number also yields good performance in practice. Concretely, 
we are interested in knowing what is the magnitude of the number N = p1-...-pr 
and how many bits are used to represent the numbers in P,,. 
We start by introducing the notion of primorial. 


Definition 3 (Primorial). Given a number n, the primorial n# is defined as 
the product of all primes smaller than n—i.e., n¥ = [I p. 

pePn 
The primorial captures the size N of the interval covered by the Chinese Re- 
mainder Theorem when using prime numbers up to n. The following number 
theory result gives us a close form for the primorial and shows that the number 
N has approximately n bits. 


nd## = e(ite(1))n — g(1+o(1))n (2) 


We use another number theory notion to quantify the number of bits in Ph. 


Definition 4 (Chebyshev function). Given a number n, the Chebyshev func- 
tion Y(n) is the sum of the logarithms of all the prime numbers smaller than 
n—i.e., V(n) = D> logp. 

pePn 
The following number theory result relates the primorial to the Chebyshev func- 
tion. 


O(n) = log(n#) = log 2™t+e)" = (1 + o(1))n (3) 


Aside from rounding errors, the Chebyshev function captures the number of bits 
required to represent the numbers in P,,. To obtain a more precise bound on this 


number, we need a bound for the formula X` [log p]. 
pePn 
We start by recalling the following fundamental number theory result. 


Theorem 4 (Prime number theorem). The set P„ has size approximately 
n/logn. 


Using Theorem 4, we get the following result. 
S flogp] < n/logn + Y logp ~ (1 + o(1))n (4) 
pEPn pEPn 


Representing a number e” in a classic binary encoding requires log,(e”) = 
(1 + o(1))n bits and, combining Equations 2 and 4, we get the following result. 


Theorem 5. Representing a number 2” in binary requires (1+o0(1))n bits under 
both modular and integer semantics. 
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Hence, representing a number in binary requires the same number of bits in 
the both semantics. 


Example 7. Consider the set Pig = {2,3,5,7,11,13,17}, which can model an 
interval of N = 510,510 integers (i.e., n = 18 in Theorem 5). Representing N in 
binary requires 19 bits while the binary representations of all the primes in Pg 
use 22 bits. Both numbers are close to 18 as predicted by the theorem. 


7.2 Bit-complexity of Unary Encoding 


As discussed in Sec. 3, the default SKETCH solver encodes numbers using a unary 
encoding—i.e., SKETCH requires 2” bits to encode the number 2”. Representing 
the same number in unary under the modular semantics requires only prime 


numbers smaller than n and therefore $` p bits. We can then use the following 
pEPn 
closed form to approximate this quantity. 


2 


n 
D rT (5) 


pEPn 


Equation 5 yields the following theorem. 


Theorem 6. Representing a number 2” in unary requires 2” bits in the integer 
2 
semantics and approximately ack bits in the modular semantics. 


These results show that, under a unary encoding, the modular semantics is 
exponentially more succinct than the integer semantics. 


Example 8. Consider again the prime set Pig = {2,3,5,7, 11, 13,17}, which can 
model an interval of N = 510,510 integers. Representing N in unary requires 
510,510 bits. On the other hand, the sum of the bits in the unary encoding of 
the primes in Pj, is 58. 


7.3 Number of Required Primes 


We analyze how many primes are needed to represent a certain number in the 
modular semantics. We start by introducing the following alternative version of 
the primorial. 


Definition 5 (Prime Primorial). For the n-th prime number pn, the prime 


n 
primorial p,,# is defined as the product of the first n primes—i.e., pn# = [I pi- 
k=1 


The following known number theory result gives us an approximation for the 


prime primorial. 
PnH = e(ite(l))n logn (6) 


Notice how the approximation of the primorial differs from that of the prime 
primorial. This is due to the fact that prime numbers are sparse—i.e., the n-th 
prime number is approximately n log n. 

Using Equation 6 we obtain the following result. 
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Theorem 7. Representing numbers in an interval of size N = e"'!°8" in the 
modular semantics requires the first n prime numbers. 


Since the relation k = nlogn does not admit a closed form for n, we cannot 
derive exactly how many primes are needed to represent a number 2” with k 
bits. It is however clear from the theorem that the number of required primes 
grows slower than k. 


8 Evaluation 


We implemented a prototype of our technique as a simple compiler in Java. Our 
implementation provides a simplified SKETCH frontend, which only allows the 
limited syntax we support. Given a SKETCH file, our tool rewrites it into a differ- 
ent SKETCH file that operates according to the modular semantics. We will use 
Unary to denote the result obtained by running the default version of SKETCH 
with unary integer encoding on the original SKETCH file, BINARY to denote the 
result obtained by running the version of SKETCH using an SMT-like native in- 
teger solver based on binary integer encoding, UNARY-P to denote the result of 
running the default SKETCH version on our modified SKETCH file, and UNARY- 
P-INC to denote the result of running the default version of SKETCH on the file 
generated by the incremental version of our algorithm described in Section 6. As 
expected from our theory, the prime technique is not beneficial for the SMT-like 
native integer solver and always results in worse runtime. Therefore, we do not 
present data for this solver. All experiments were performed on a machine with 
4.0GHz Intel Core i7 CPU with 16GB RAM with SKkETCH-1.7.5 and we use a 
timeout value of 300 seconds (we also report out-of-memory errors as timeouts). 
Our evaluation answers the following research questions: 


Q1 How does the performance of UNARY-P compare to UNARY and BINARY? 
Q2 How does the incremental algorithm compare to the non-incremental one? 
Q3 Is UNARY-P’s performance sensitive to the set of selected prime numbers? 
Q4 How many primes are needed by UNARY-P to produce correct solutions? 
Q5 Does UNARY generate larger SAT queries than UNARY-P? 


8.1 Benchmarks 


We perform our evaluation on three families of programs. 


Polynomials The first set of benchmarks contains 81 variants of the polynomial 
synthesis problem presented in Figure 1. The original version of this benchmark 
appears in the SKETCH benchmark suite under the name polynomial.sk. For 
each benchmark, we generate a random polynomial f, random inputs {a}, and 
take the set {(a’, f(a))} as specification. Each benchmark in this set has the 
following parameters: #Ex€ {2,4,6} is the number of input-output examples as 
specification, cbits€ {5,6,7} denote the number of bits hole values can use, 
exIne {[—10, 10], [—30, 30], [—50, 50]} denotes the range of randomly generated 
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input examples and coeffe€ {[—10, 10], [—30, 30], [—50, 50]} denotes the range of 
randomly generated coefficients in the polynomial f. 


Invariants The second set of benchmarks contain 46 variants of two invariant 
generation problems obtained from a public set of programs that require poly- 
nomial invariants to be verified [8]. We selected the two programs in which at 
least one variable could be tracked modularly by our tool (the other programs 
involved complex array operations or inequality operators) and turned the verifi- 
cation problems into synthesis problems by asking SKETCH to find a polynomial 
equality (using the program variables) that is an invariant for the loop in the 
program. To control the size of the magnitudes of the inputs, we only require 
the invariants to hold for a fixed set of input examples. 


The first problem, mannadiv, iteratively computes the remainder and the 
quotient of two numbers given as input. The invariant required to verify mannadiv 
is a polynomial equality of degree 2 involving 5 variables. The SKETCH template 
required to describe the space of all polynomial equalities has 32 holes and can- 
not be handled by any of the SKETCH solvers we consider. We therefore simplify 
the invariant synthesis problems in two ways. In the first variant, we reduce the 
ranges of the hole values in the templates by considering cbits € {2,3}. In the 
second variant, we set cbits = {5,6,7}, but reduce the number of missing hole 
values to 4 (i.e., we provide part of the invariant). Each benchmark takes two 
random inputs and we consider the following input ranges {[1, 50], [1, 100]}. In 
total, we have 10 benchmarks for mannadiv. 


The second problem, petter, iteratively computes the sum )>,<;<,,7° for a 
given input n. The invariant required to verify petter is a polynomial equality 
of degree 6 involving 3 variables. The SKETCH template required to describe all 
such polynomial equalities has 56 holes and cannot be handled by any of the 
SKETCH solvers we consider. We consider the following simplified variants of the 
problem: (i) petter_0 computes )>,<,<,, 1 and requires a polynomial invariant 
of degree one, (ii) petter_x computes >>, <;<,,2 for a given input variable x 
and requires a polynomial invariant of degree two, (iii) petter_1 computes 
i cjcn Í and requires a polynomial invariant of degree two, and (iv) petter_10 
computes >, <;<,,4 +1 and requires a polynomial invariant of degree two. Each 
benchmark takes two random inputs and we consider the following input ranges 
{[1, 10], [1, 100}, [1, 1000]}. In total, we have 12 variants of petter, each run for 
values of cbits € {5, 6, 7}—i.e., a total of 36 benchmarks. 


Program Repair The third set of benchmarks contains 54 variants of SKETCH 
problems from the domain of automatic feedback generation for introductory 
programming assignments [7]. Each benchmark corresponds to an incorrect pro- 
gram submitted by a student and the goal of the synthesizer is to find a small 
variation of the program that behaves correctly on a set of test cases. We select 
the 6/11 benchmarks from the tool Qlose |7] for which (i) our implementation 
can support all the features in the program, and (ii) our data flow analysis 
identifies at least one variable that can be tracked modularly. Of the remaining 
benchmarks, 3/11 do not contain variables that can be tracked modularly, and 
2/11 call auxiliary functions that cannot be translated into SKETCH. For each 
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Table 1: Effectiveness of different solvers. SAT (resp. UNSAT) denotes the num- 
ber of benchmarks for which solver could find a solution to the benchmarks (resp. 
prove no solution existed) while TO denotes the number of timeouts. 


Polynomials Invariants Program repair 
Solver Solved|SAT UNSAT TO| SAT UNSAT TO|SAT UNSAT TO 
UNARY 69/181] 12 4 65| 5 0 41 | 48 0 6 
BINARY 127/181) 70 6 5 | 17 0 29 | 34 0 20 
Unary-P 169/181] 73 5 3 | 41 2 3 | 48 0 6 
UNARY-P-INC 172/181) 73 6 2 | 41 2 3 | 50 0 4 


program, we consider the original problem and two variants where the integer 
inputs are multiplied by 10 and 100, respectively. Further, for each program vari- 
ants, we impose an assertion specifying that the distance between the original 
program and the repaired program is within a certain bound. We select three 
different bounds for each program: the minimum cost c, c+ 100, and c+ 200. 


8.2 Performance of UNARY-P 


Table 1 summarizes our comparison. First, we compare the performance of 
UNARY-P and UNARY. We use P = {2,3,5,7,11,13,17}$, which is enough for 
UNARY-P to always find correct solutions (we verify the correctness of a solution 
by instantiating the hole values in the original sketch programs). UNARY can only 
solve 69/181 benchmarks while UNARY-P can solve 169/181. Figure 7a shows a 
scatter plot (log scale) of the solving times for the two techniques: each point 
below the diagonal line denotes a benchmark on which UNARY-P was faster than 
UNARY. Points on the extreme right-hand side of the plot denote timeout for 
Unary. When both solvers terminate, UNARY-P (avg. 1.7s) is 6.1X (geometric 
mean) faster than UNARY (avg. 25.0s). 

Next, we compare the performance of UNARY-P and BINARY (Figure 7b). On 
the 64 easier benchmarks that BINARY can solve in less than 1 second, BINARY 
(avg. 0.55s) outperforms UNARY-P (avg. 2.32s), but UNARy-P still has reason- 
able performance. On the 49 benchmarks that BINARY can solve between 1 and 
10 seconds, UNARY-P (avg. 3.5s) is on average 1.9X faster than BINARY (avg. 
6.9s). Most interestingly, for the 14 harder benchmarks for which BINARY takes 
more than 10 seconds, UNARY-P (avg. 5.7s) is on average 15.9X faster than BI- 
NARY (avg. 90.9s). Remarkably, UNARY-P solved 43 of the benchmarks (in less 
than 8s each) for which BINARY timed out*, and UNARY-P only timed out for 
two benchmarks that BINARY could solve in less than a second and one bench- 
mark that BINARY could solve in 260s. Finally, we would like to highlight that 
for 41/208 benchmarks, even UNARY outperforms BINARY. As expected from 


t During our experiment, we observed that BINARY incorrectly reported UNSAT for 
10 satisfiable benchmarks. We reported these benchmarks as timeouts and have 
contacted the authors of SKETCH to address the issue. 
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the discussion throughout the paper, these are benchmarks typically involving 
complex operations but not involving overly large numbers. 

We can now answer Q1. First, UNARY-P consistently outperforms UNARY 

across all benchmarks. Second, UNARY-P outperforms BINARY on hard-to- 
solve problems and can solve problems that BINARY cannot solve— 
e.g., UNARY-P solved 28/46 invariant problems that SKETCH could not solve. 
UNARY-P and BINARY have similar performance on easy problems. 
Comparison to full SMT encoding For completeness, we also compare our ap- 
proach to a tool that uses SMT solvers to model the entire synthesis problem. 
We choose the state-of-the-art SMT-based synthesizer ROSETTE [23] for our 
comparison. ROSETTE is a programming language that encodes verification and 
synthesis constraints written in a domain- 
specific language into SMT formulas that 
can be solved using SMT solvers. 


We only run ROSETTE on the set of œ 
Polynomials because ROSETTE does sup- = 10" : 
port the theories of integers, but does not £ 3 * 
have native support for loops, so there á 10° - az ' 


is no direct way to encode Invariants 


and Program Repair benchmarks. To our 10? £ 

knowledge, ROSETTE provides a way to i 102 103 104 108 
specify the number k it uses to model in- ROSETTE (ms) 
tegers and reals as k-bit words, but the 

user has no control over how many bits Fig.6: ROSETTE vs BINARY 


it uses for unknown holes specifically. So 
we evaluate 27 instead of 81 variants of the polynomial synthesis problem on 
ROSETTE, i.e., we consider different numbers of cbits. 

Figure 6 shows the running times (log scale) for ROSETTE and BINARY with 
cbits=6. ROSETTE successfully solved 16/27 benchmarks and it terminates 
quickly (avg. 2.9s) when it can find a solution. However, ROSETTE times out 
on 11 benchmarks for which BINARY terminates. The timeouts are due to the 
fact that ROSETTE employs full SMT encodings that combine multiple theories 
while BINARY uses a SAT solver that is only modified to accommodate SMT-like 
integer constraints. Since we now know full SMT encodings are not as general 
and efficient as the encodings used in SKETCH, we will only evaluate the effec- 
tiveness of our technique based on comparison with BINARY. 

Finally, we tried applying our prime-based technique to ROSETTE and, as 
expected, the technique is not beneficial due to the binary encoding of numbers 
in SMT, and causes all benchmarks to timeout. To summarize, (i) SMT solvers 
cannot efficiently handle the synthesis problems considered in this paper, and 
(ii) our technique is better suited for SAT solvers than SMT solvers. 


8.3 Performance of Incremental Solving 


Our implementation of the incremental solver UNARY-P-INC first attempts to 
find a solution with the prime set P = {2,3,5,7}. If the solver returns a correct 
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solution, UNARY-P-INC terminates. Oth- SPalynomisla 
erwise, UNARY-P-INC incrementally adds 10° S Pa a 
. . . — nvariant 
the next prime to P until it finds a 2 
correct solution, it proves there is no PORN 
. š . 1 10 
solution, or it times out. UNARY-P- 2 
INC is 25.2% (geometric mean) slower 2 
than UNARY-P (Figure 8 (log scale)). ag 
UNARY-P-INC can solve three bench- 
marks for which both UNARY-P and 102 104 o TO 
BINARY timed out. To answer Q3, UNARY-P-INC (ms) 
UNARY-P-INC and UNARY-P have 
similar performance. Fig. 8: UNARY-P-INC vs UNARY-P 


8.4 Varying the Prime Number Set P 


In this experiment, we evaluate how different prime number sets affect UNARY-P. 

We consider the 5 increasing sets of primes: P5 = {2,3,5}, P7 = {2,3,5,7}, 
Pit = {2,3,5,7,11}, Pig = {2,3,5,7,11,13}, and Py7 = {2,3,5,7,11,13,17}. 
Figure 9a (log scale) shows the running times for all the polynomial benchmarks 
with cbits=7 (showing all benchmarks would clutter the plot). The points where 
the lines change from dashed to solid denote the number of primes for which the 
algorithm starts yielding correct solutions. As expected, a smaller set of primes 
leads to faster solving times as the resulting constraints are smaller and fewer 
bits are needed for encoding intermediate values. The runtime on average grows 
with the increasing size of the primes. For example, across all benchmarks, using 
P17 takes 23% longer on average than using P11. To answer Q3, UNARY-P is 
slower when using increasingly large sets of prime. 

In terms of correctness, we find that smaller prime sets often yield incorrect 
solutions (P; (37% correct), Py (70%), Pi, (86%), Pig (97 %), and Piz (100%) 
because there is not enough discriminative power with fewer primes and the 
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Fig. 9: Performance for different sets of prime numbers. 


solutions may overfit to the smaller set of intermediate values. It is interesting 
to note that even prime sets of intermediate size often lead to correct solutions 
in practice, which explains some of the speedups observed in the incremental 
synthesis algorithm. To answer Q4, UNARY-P is able to synthesize correct 
solutions even with intermediate sized sets of primes. 


Changing Magnitude of Primes We also evaluate the performance of UNARY- 
P when using primes of different magnitudes. We consider the sets of primes 
{11, 17, 19, 23}, {31, 41,47}, and {251, 263}, which define similar integer ranges, 
but pose different trade-offs between the number of used primes and their sizes— 
e.g., the set {251, 263} only uses two very large primes. Since the different sets 
cover similar integer ranges, they all produce correct solutions. Figure 9b (log 
scale) shows the running time of UNARy-P for the same benchmarks as Figure 9a. 
Larger prime sets of smaller prime values require less time to solve than smaller 
prime sets of larger prime values. This result is expected since, in the unary 
encoding of numbers, representing larger numbers requires more bits. 


8.5 Size of SAT Formulas 


In this experiment, we compare the sizes of the intermediate SAT formulas gen- 
erated by UNARY-P and UNARY. Figure 10a shows a scatter plot (log scale) of 
the number of clauses of the largest intermediate SAT query generated by the 
CEGIS algorithm for the two techniques. We only plot the instances in which 
UNARY was able to produce at least a SAT formula. UNARY produces SAT for- 
mulas that are on average 19.3X larger than those produced by UNARY-P. To 
answer Q5, as predicted by our theory, UNARY-P produces significantly 
smaller SAT queries than UNARY. 


Performance vs Size of SAT Queries We also evaluate the correlation between 
synthesis time and size of SAT queries. Figure 10b plots the synthesis times of 
both solvers against the sizes of the SAT queries. It is clear that the synthesis 
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Fig. 10: SAT formulas sizes and performance. 


time increases with larger SAT queries. The plot illustrates how the solving time 
strongly depends on the size of the generated formulas. 


9 Related Work 


Program Sketching Program sketching was designed to automatically synthesize 
efficient bit-vector manipulations from inefficient iterative implementations [21]. 
The SKETCH tool has since been engineered to support complex language fea- 
tures and operations [19]. Thanks to its simplicity, sketching has found wide 
adoption in applications such as optimizing database queries [3], automated 
feedback generation [18], program repair |7], and many others. Our work further 
extends the capabilities of SKETCH in a new direction by leveraging number 
theory results. In particular, our technique allows SKETCH to handle sketches 
manipulating large integer numbers. To the best of our knowledge, our technique 
is the first one that can solve many of the benchmarks presented in this paper. 


Uses of Chinese Remainder Theorem The Chinese Remainder Theorem and its 
derivative corollaries have found wide application in several branches of Com- 
puter Science and, in particular, in Cryptography [11, 26]. 

The idea of using modular arithmetic to abstract integer values has been 
used in program analysis. Since modular fields are finite, they can be used as 
an abstract domain for verifying programs manipulating integers [5]—e.g., the 
abstract domain can track whether a number is even or odd. Our work extends 
this idea to the domain of program synthesis and requires us to solve several 
challenges. First, when used for verifying programs, the modular abstraction is 
used to overapproximate the set of possible values of the program and does not 
need to be precise. In particular, Clark et al. [5] allow program operations that 
are in the IMP language but not in the IMP-MOD language and lose precision when 
modeling such operations—e.g., when performing the assignment x = x/2 the 
value of x mod 2 can be either 0 or 1. Such imprecision is fine in program analysis 
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since the abstraction is used to show that a program does not contain a bug— 
i.e., even in the abstract domain, the problem behaves fine. In our setting, the 
problem is opposite as we use the abstraction to simplify the synthesis problem 
and provide a theory for when the modular and integer semantics are equivalent. 


Pruning Spaces in Program Synthesis Many techniques have been proposed to 
prune large search space of possible programs [14]. Enumerative synthesis tech- 
niques |24, 12,13, 17] enumerate programs in a search space and avoid enumer- 
ating syntactically and semantically equivalent terms. Some synthesizers such 
as Synquid [16] and Morpheus [10] use refinement types and first-order formu- 
las over specifications of DSL constructs to refute inconsistent programs. Re- 
cently, Wang et al. [25] proposed a technique based on abstraction refinement 
for iteratively refining abstractions to construct synthesis problems of increasing 
complexity for incremental search over a large space of programs. 


Instead of pruning programs in the syntactic space, our technique uses mod- 
ular arithmetic to prune the semantic space—i.e., the complexity of verifying the 
correctness of the synthesized solution—while maintaining the syntactic space 
of programs. Our approach is related to that of Tiwari et al. [22], who present a 
technique for component-based synthesis using dual semantics—where syntactic 
symbols in a language are provided two different semantics to capture differ- 
ent requirements. Our technique is similar in the sense that we also provide an 
additional semantics based on modular arithmetic. However, we formalize our 
analysis based on number theory results and develop it in the context of general- 
purpose SKETCH programs that manipulate integer values, unlike Tiwari et al.’s 
work that is developed for straight-line programs composed of components. 


Synthesis for Large Integer Values Abate et al. propose a modification of the 
CEGIS algorithm for solving syntaz-guided synthesis (SyGuS) problems with 
large constants [1]. SyGuS differs from program sketching in how the synthesis 
problem is posed and in the type of programs that can be modeled. In particular, 
in SyGuS one can only describe programs representing SMT formulas and the 
logical specification for the problem can only relate the input and output of the 
program—i.e., there cannot be intermediate assertions within the program. The 
problem setup and the solving algorithms proposed in this paper are orthogonal 
to those of Abate et al. First, we focus on program sketching, which is orthog- 
onal to SyGuS as sketching allows for richer and more generic program spaces 
as well as richer specifications. While it is true that certain synthesis problems 
can be expressed both as sketches and as SyGuS problems, this is not the case 
for our benchmarks programs, which use loops, arrays and non-linear integer 
arithmetic, all of which are not supported by SyGuS. Second, our technique is 
motivated by how SKETCH encodes and solves program sketches through SAT 
solving. While the traditional SKETCH encoding can explode for large constants, 
the same encoding allows SKETCH to solve program sketches involving complex 
arithmetic and complex programming constructs. The algorithm proposed by 
Abate et al. iteratively builds SMT (not SAT) formulas that are required to 
be in a decidable logical theory. Such an encoding only works for the restricted 
programming models used in SyGuS problems. 
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Abstract. We present a denotational semantics for weak memory con- 
currency that avoids thin-air reads, provides data-race free programs 
with sequentially consistent semantics (DRF-SC), and supports a com- 
positional refinement relation for validating optimisations. Our semantics 
identifies false program dependencies that might be removed by compiler 
optimisation, and leaves in place just the dependencies necessary to rule 
out thin-air reads. We show that our dependency calculation can be 
used to rule out thin-air reads in any axiomatic concurrency model, in 
particular C++. We present a tool that automatically evaluates litmus 
tests, show that we can augment C++ to fix the thin-air problem, and 
we prove that our augmentation is compatible with the previously used 
compilation mappings over key processor architectures. We argue that 
our dependency calculation offers a practical route to fixing the long- 
standing problem of thin-air reads in the C++ specification. 


Keywords: Thin-air problem - Weak memory concurrency - Compiler 
Optimisations - Denotational Semantics - Compositionality 


1 Introduction 


It has been a longstanding problem to define the semantics of programming 
languages with shared memory concurrency in a way that does not allow un- 
wanted behaviours — especially observing thin-air values [8,7] — and that does 
not forbid compiler optimisations that are important in practice, as is the case 
with Java and Hotspot [30,29]. Recent attempts [16,11,25,15] have abandoned 
the style of axiomatic models, which is the de facto paradigm of industrial spec- 
ification [8,2,6]. Axiomatic models comprise rules that allow or forbid individual 
program executions. While it is impossible to solve all of the problems in an 
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axiomatic setting [7], abandoning it completely casts aside mature tools for au- 
tomatic evaluation [3], automatic test generation [32], and model checking [23], 
as well as the hard-won refinements embodied in existing specifications like C++, 
where problems have been discovered and fixed [8,7,18]. Furthermore, the indus- 
trial appetite for fundamental change is limited. In this paper we offer a solution 
to the thin-air problem that integrates with existing axiomatic models. 

The thin-air problem in C++ stems from a failure to account for dependen- 
cies [22]: false dependencies are those that optimisation might remove, and real 
dependencies must be left in place to forbid unwanted behaviour [7]. A single 
execution is not sufficient to discern real and false dependencies. A key insight 
from previous work [14,15] is that event structures [33,34] give us a simultane- 
ous overview of all traces at once, allowing us to check whether a write is sure 
to happen in every branch of execution. Unfortunately, previous work does not 
integrate well with axiomatic models, nor lend itself to automatic evaluation. 

To address this, we construct a denotational semantics in which the meaning 
of an entire program is constructed by combining the meanings of its subcom- 
ponents via a compositional function over the program text. This approach can 
be particularly amenable to automatic evaluation, reasoning and compiler certi- 
fication [19,24], and fits with the prevailing axiomatic approach. 

This paper uses this denotational approach to capturing program dependen- 
cies to explore the thin-air problem, resulting in a concrete proposal for fixing 
the thin-air problem in the ISO standard for C++. 


Contributions. There are two parts to the paper. In the first, we develop a deno- 
tational model called “Modular Relaxed Dependencies model” (MRD) and build 
metatheory around it. The model uses a relatively simple account of synchronisa- 
tion, but it demonstrates separation between the calculation of dependency and 
the enforcement of synchronisation. In the second, we evaluate the dependency 
calculation by combining it with the fully-featured axiomatic models RC11 [18] 
and IMM _ [26]. 
The denotational semantics has the following advantages: 


= 


. It is the first thin-air solution to support fork/join (§2.2). 

2. It satisfies the DRF-SC property for a compositional model (§5): programs 
without data races behave according to sequential consistency. 

3. It comes with a refinement relation that validates program transformations, 
including the optimisation that makes Hotspot unsound for Java [30,29], and 
a list of others from the Java Causality Tests [27] (§7). 

4. It is shown to be equivalent to a global semantics that first performs a 
dependency calculation and then applies an axiomatic model. 

5. An example in Section 10 illustrates a case in which thin-air values are 

observable in the current state-of-the-art models but forbidden in ours. 


We adopt the dependency calculation from the global semantics of point 4 as 
the basis of our C++ model, which we call MRD-C11. We establish the C++ 
DRE-SC property described in the standard [13] (§9.1) and we provide several 
desirable properties for a solution to the thin-air problem in C++: 
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5. We show that our dependency calculation is the first that can be applied 
to any axiomatic model, and in particular the RC11 and IMM models that 
cover C++ concurrency (88). 

6. Our augmented IMM model, which we call MRD+IMM, is provably imple- 
mentable over x86, Power, ARMv8, ARMv7 and RISC-V, with the compiler 
mappings provided by the IMM _ [26] (§8.1). 

7. These augmented models of C++ are the first that solve the thin-air problem 
to have a tool that can automatically evaluate litmus tests (§11). 


1.1 Modular Relaxed Dependency by example 


To simplify things for now, we will attach an Init program to the beginning 
of each example to initialise all global variables to zero. Doing this makes the 
semantics non-compositional, but it is a natural starting place and aligns well 
with previous work in the area. Later, after we have made all of our formal 
definitions, we will see why the Init program is not necessary. 

For now, consider a simple programming language where all values are booleans, 
registers (ranging over r) are thread-local, and variables (ranging over x, y) are 
global. Informally, an event structure for a program consists of a directed graph 
of events. Events represent the global variable reads and writes that occur on all 
possible paths that the program can take. This can be built up over the program 
as follows: each write generates a single event, while each read generates two — 
one for each possible value that could be read. These read events are put in 
conflict with each other to indicate that they cannot both happen in a single 
execution, this is indicated with a zig-zag red arrow between the two events. 
Additionally, the event structure tracks true dependencies via an additional re- 
lation which we call semantic dependencies (DP). These are yellow arrows from 
read events to write events. 

For example, consider the program 


(rı :=x; y :=r1) (LB) 


that reads from a variable x and then writes the result to y. The interpretation 
of this program is an event structure depicted as follows: 


(Reo Rri] 
3 (l ) s 


[W y 0] [Wy 1] 


Each event has a unique identifier (the number attached to the box). The 
straight black arrows represent program order, the curved yellow arrows indicate 
a causal dependency between the reads and writes, and the red zigzag represents 
a conflict between two events. If two events are in conflict, then their respective 
continuations are in conflict too. 

If we interpret the program Init; DB, as below, we get a program where 
the Init event sets the variables to zero. 
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3 ( ls 


[W y 0] [W y 1) 


In the above event structure, we highlight events {1, 2,3} to identify an exe- 
cution. The green dotted arrow indicates that event 2 reads its value from event 
1, we call this relation reads-from (RF). This execution is complete as all of its 
reads read from a write and it is closed w.r.t conflict-free program order. 

We interpret the following program similarly, 


(r2 :=y; X :=r2) (LB2) 


leading to a symmetrical event structure where the write to x is dependent on 
the read from y. 

The interpretation of Init; (LB, || LB) gives the event structure where 
(LBı) and (LB) are simply placed alongside one another. 


(wyo) (wyi) (wao) (weil 


The interpretation of parallel composition is the union of the event structures 
from LB, and LBə2 without any additional conflict edges. When parallel compos- 
ing the semantics of two programs, we add all RF-edges that satisfy a coherence 
axiom. Here we present an axiom that provides desirable behaviour in this ex- 
ample (Section 4 provides our model’s complete axioms). 


(DP URF) is acyclic 


The program Init; (LB, || LB2) allows executions of the following three 
shapes. 
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Note that in this example, we are not allowed to read the value 1 — reading 
a value that does not appear in the program is one sort of thin-air behaviour, as 
described by Batty et al. [7]. For example, the execution {1,4,5,8,9} does not 
satisfy the coherence axiom as 4 “353-59 4 forms a cycle. 


We now substitute (LB2) with the following code snippet 


rı :=y;x:=1 (LB3) 


where the value written to the variable x is a constant. Its generated event 
structure is depicted as follows 


In this program, for each branch, we can reach a write of value 1 to location 
x. Hence, this will happen no matter which branch is chosen: we say b and d 
are independent writes and we draw no dependency edges from their preceding 
reads. 

Consider now the program (LBs) in parallel with DB, introduced earlier in 
this section. As usual, we interpret the Init program in sequence with (LB, || 
LB) as follows: 


The resulting event structure is very similar to that of (ZB, || LB2), but the 
executions permitted in this event structure are different. The dependency edges 
calculated when adding the read are preserved, and now executions {1, 2,3, a, b} 
and {1,a,b,4,5} are allowed. However, this event structure also contains the 
execution in which d is independent. 

In the execution {d > 4 Æ 5 Æ c} there is 
no RF or DP edge between d and c that can create 
a cycle, hence this is a valid complete execution in 
which we can observe x = 1,y = 1. Note that the 
Init is irrelevant in the consistency of this execution. 


Modularity. It is worthwhile underlining the role that 
modularity plays here. In order to compute the be- 
haviour of (LB, || LB2) and (LB, || LB3) we did not have to compute the 
behaviour of LB, again. In fact, we computed the semantics of DB,, LB2 and 
LB; in isolation and then we observed the behaviour in parallel composition. 
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Thin-air values. The program (LB, || LBs) is a standard example in the weak 
memory literature called load buffering. In the program (LB, || LB), if event 5 
or 9 were allowed in a complete execution, that would be an undesirable thin-air 
behaviour: there is no value 1 in the program text, nor does any operation in the 
program compute the value 1. The program (LB, || LBs) is similar, but now 
contains a write of value 1 in the program text, so this is no longer a thin-air 
value. Note that the execution given for it is not sequentially consistent, but 
nonetheless a weak memory model needs to allow it so that a compiler can, for 
example, swap the order of the two commands in LB3, which are completely 
independent of each other from its perspective. 


2 Event Structures 
Event structures will form the semantic domain of our denotational semantics 


in Section 5. Our presentation follows the essential ideas of Winskel [33] and is 
further influenced by the treatment of shared memory by Jeffrey and Riely [15]. 


2.1 Background 


A partial order (EC) is a set Æ equipped with a reflexive, transitive and an- 
tisymmetric relation C. A well-founded partial order is a partial order that has 
no infinite decreasing chains of the form --- E e;-; E e; E eip1 ++. 

A prime event structure is a triple (E,C,#). E is a set of events, E is a 
well-founded partial order on E and # is a conflict relation on E. # is binary, 
symmetric and irreflexive such that, for all c,d,e € E, if c#d E e then c#e. We 
write Con(£) for the set of conflict-free subsets of E, i.e. those subsets C C E 
for which there is no c,d € C such that c#d. 


Notation. We use E to range over (prime/labelled/memory) event structures, 
and also the event set contained within, when there is no ambiguity. We also use 
E for event structures. 

A labelled event structure (E, E, #, A), over a set of labels X, is a prime event 
structure together with a function A: E > X which assigns a label to an event. 
We make events explicit using the notation {e : o} for \(e) = 0. We sometimes 
avoid using names and just write the label ø when there is no risk of confusion. 

Consider the labelled event structure formed by the 
set {1,2,3,4}, where the order relation is defined such 
that 1 C 2 C 3 and 1 C 4, the conflict relation is defined 
such that 244 and 3#4, and the labelling function is 
defined such that A(1) = (W z 0), A(2) = (R z 0), A(3) = 
(W y 1) and \(4) = (R x 1). The event structure is 
visualised on the left (we elide conflict edges that can be 
inferred from order). 

Given labelled event structures €; and E> define the product labelled event 
structure £1 x €) = (E, C, #, A). E is Ej UE, assuming E and Fs to be disjoint, 
is Lı U Lo, # is #1 U #ə and À is Ay U Xo. 
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The coproduct labelled event structure E1 + E2 is the same as the product, 
except that the conflict relation # is #, U #3 U {E1 x E2} U {E2 x E1}. We 
can use a similar construction for the co-product of an infinite set of pairwise- 
disjoint labelled event structures, indexed by I: we take infinite unions on the 
underlying sets and relations, along with extra conflicts for every pair of indices. 
Where the €; are not disjoint, we can make them so by renaming with fresh 
event identifiers. In particular, we will need the infinite coproduct J;e E with 
as many copies of € as the cardinality of the set J, and all the events between 
each copy in conflict. Each of these copies will by referred to as &*. 

For a labelled event structure Eo and an event e, where e ¢ Eo, define the 
prefix labelled event structure, e è Eo, as a labelled event structure (E, E, #, A) 
where E equals Ep U {e}, E equals Eo U ({e} x E), and # equals #ọ. 


2.2 The fork-join event structure 


Our language supports parallel composition nested under sequential composi- 
tion, so we will need to model spawning threads and a subsequent wait for their 
termination. To support this, we define the fork-join composition of two labelled 
event structures, €; x E2. First we define the leaves, | (£), as the C-maximal ele- 
ments of E. Let I be the set of maximal conflict-free subsets of | (£1). Intuitively, 
each event set in J corresponds to the last events* of one way of executing the 
concurrent threads in E1. We then generate a fresh copy of E> for each of the 
executions: E3 = J) jer E2- 

Now €1*€2 = (E, E, #, A) such that E is E1 U E3, # is #1 U #3, Ais A1UAs, 
C is the transitive closure of 


Ci UE U| Jile e) |e eine’ € Ef} 
tel 


The set of events, E, is the set FE, plus all the elements from the copies of 
E3. The order, E, is constructed by linking every event in the copy £4, with all 
the events in the set i, plus the obvious order from E; and the order in the local 
copy EŻ. Finally, the conflict relation is the union of the conflict in £, and £3. 


3 Coherent event structure 
The signature of labels, X, is defined as follows: 


X= ({R,W} x & x V)+{L}+{U} 


where (W «x v) € X and (R z v) € X are the usual write and read operations 
and L, U are the lock and unlock operations respectively. 

A coherent event structure is a tuple (E, S,}, <) where E is a labeled event 
structure. S is a set of partial executions, where each execution is a tuple compris- 
ing a maximal conflict-free set of events, together with an intra-thread reads-from 


4 We assume that there are no infinite increasing C-chains in €. 
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relation RF;, an extra-thread reads-from RFe, a dependency relation DP, and a 
partial order on lock/unlock events LK. The justification relation, FĀ, is a relation 
between conflict-free sets and events. Finally, the preserved program order, <¥, 
is a restriction of the program order, C, for events on the same variable. <} is 
the restriction of program order on events related in program order with locks 
or unlocks. Finally, we define RF to be RF. U RF; and < to be <* U <L. For a 
partial execution, X € S, we denote its components as LKx, RFx and DPx. 

Justification, +, collects dependency information in the program and is used 
to calculate DPx. For a conflict-free set C and an event e, we say C justifies 
e or e depends on C whenever C F e. We collect dependencies between events 
modularly in order to identify the so-called independent writes which will be 
introduced shortly. 

For a given partial execution, X, we define the order HBx as the reflexive 
transitive closure of (C ULKx). A coherent event structure contains a data race 
if there exists an execution X, with two events on the same variable x, at least 
one of which is a write, that are not ordered by HBx. A coherent event structure 
is data-race-free if it does not contain any data race. A racy RFx-edge is when 
two events w and r are racy and w =S y r. Note that RF; edges cannot ever be 
racy. We now define a coherent partial execution. 


Definition 1 (Coherent Partial Execution). A partial execution X is co- 
herent if and only if: 


1. (< ULKx UDPx U RFex) is acyclic, and 
2. if (w: W z v) —>x (r: R xv) there are no (e: Rav’) or (e: W z _) such 
that w =5 e —> r with v £v. 
A complete execution X is an execution where all read events r have a write 
w that they read from, i.e. w Z yr. 


4 Weak memory model 


Central to the model is the way it records program dependencies in F and DP. 
Justification, F, records the structure of those dependencies in the program that 
may be influenced by further composition. As we shall see, composing programs 
may add or remove dependencies from justification: for example, composing a 
read may make later writes dependent, or the coproduct mechanism, introduced 
shortly, may remove them. In some parts of the program, e.g. inside locked 
regions, dependencies do not interact with the context. In this case, we freeze 
the justifications, using them to calculate Dp. Following a freeze, the justification 
relation is redundant and can be forgotten — DP can be used to judge which 
executions are coherent. 


Freezing. Here we define a function freeze which takes a justification © F (w : 
W z v) and gives the corresponding dependency relation (r : R x v) 5 (w : 
W z v) iff r e C. We lift freeze to a function on an event structure as follows: 
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freeze(E1, 51,41, <1) = (E1, 8,0, <1) (1) 
where S contains all the executions 
(X1, LKx,, (DPx, UDP), RFx,) 


where for each write, w; E€ X1, we choose a justification so that Cy Fy w1, ..., Cn Fi 
Wn covers all writes in X,. Furthermore, with DP defined as follows: 


pp=( |) _ freeze(C; + wi)) 


ic{1,; en} 
Xı must be a coherent execution. We prove that for a coherent execution there 


always exists a choice of write justifications that freeze into dependencies to form 
a coherent execution. 


We will illustrate freezing of the program, 


Yı (=X; r2 := t; if (r1 == 1 V rə == 1) {y := 1} 


whose event structure is as follows: 


(R0 


7 | a g] 
[w y 1] [Wy 1] [wy 1] 


The rules later on in this section will provide us with justifications {(6: R t 1)} + 
(9: W y 1) and {(2: R x 1)} F (9: W y 1) (but not the independent justification 
H (9: W y 1)). So in this program there are two minimal justifications of 
(9: W y 1). The result of freezing is to duplicate all partial executions for each 
choice of write justifications. In this case, we get an execution containing 2 9 
and another one containing 6 g. 


4.1 Prepending single events 


When prepending loads and stores, we model forwarding optimisations by up- 
dating the justification relation: e.g. when prepending a write, (w : W x 0), to 
an event structure where {(r: R x 0)} F w’, write forwarding satisfies the read 
of the justification, leaving an independently justified write, F w’. 


608 M. Paviotti et al. 


Forwarding is forbidden if there 
exists e in E such that w < e < 
r, as in the example on the left. 
In this example we do not for- 
ward 1 to 6. The rules of this 
section give us that {1,3,6} F 9: 
we have preserved program or- 
der over the accesses of xz, 1 < 
3 < 6, and we do not forward 
across the intervening read 3. 


Read Semantics We now define the semantics of read prepending as follows: 
(r:Rxv)e (E1, S1,H1,<1) £ ((r: R zv) e E1, S,F, <) (2) 


where preserved program order < is built straightforwardly out of <1, ordering 
locks, unlocks and same-location accesses, and S is defined as the set of all 
(X U {r}, LKx,RFx,DPx), where X is a partial execution of Sı and F is the 
smallest relation such that for all Ct, e we have 


C, U{r}\LFFe 


with LF being the “Load Forwarded” set of reads, i.e.the set of reads consecu- 
tively following the matching prepended one: 


LF ={(r':Rav)€C,| jer <* e <* r} 


This allows for load forwarding optimisations and coherence is satisfied by 
construction. 


Write Semantics The write semantics are then defined as follows: 
(w :Wea v) e (E, S1, Fi, <1) 4 ((w :We2 v) e E1, S,F, <) (3) 


where < is built as in the read rule and S' contains all coherent executions of 
the form, 
(X U {w}, LKx, (RFx URF;), DPx) 


where X € S1, and w —» r for any set of matching reads r in E, such that 
condition (1.2) of coherence is satisfied. Adding RF; edges leaves condition (1.1) 
satisfied. 

The justification relation is the smallest upward-closed relation such that 
for all C Fy e: 


1. Fw 
2. C\ SFU {wht e if there exists e' € C s.t. w <* e' 
3. C\ SF F e otherwise 
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with SF being the Store Forwarding set of reads, i.e.the set of reads that we are 
going to remove from the justification set for later events that are matching the 
write we are prepending. This is defined as follows: 


SF ={(r':Rav)| few <* e <¥ r'} 


When prepending a write to an event structure, we add it to justifications 
that contain a read to the same variable. Failing to do so would invalidate the 
DRF-SC property. We provide an example in Section 6.3, but we need to com- 
plete the definition of the semantics first, in particular, we need to explain first 
how the writes are lifted. This is coming in the next section (Section 4.2). 


4.2 Coproduct semantics 


The coproduct mechanism is responsible for making writes independent of prior 
reads if they are sure to happen, regardless of the value read. It produces the 
independent writes that enabled relaxed behaviour in the example in Section 1. 

In the definition of coproduct we use an upward-closure of justification to 
enable the lifting of more dependencies. Whenever C F e we define fî (C) as the 
upward-closed justification set, i.e. D H- e if C F e, D is a conflict-free lock-free 
set with C C D, such that for all e’ € D if e” is an event such that e” < e’ then 
e” ED. 

Now we define the coproduct operation. If Æ is a labelled event structure of 
the form (rı : R x v1) e Ej and, similarly, Æ is of the form (rə : R x v2) e ES, 
the coproduct of event structures is defined as, 


(Fi, S1,F1, <1) + (E2, S2,F2, <2) Ê (Fi + E2, S1 U S2, (F1 UF 2 UF), <) 


where whenever {r1} UC, Fy (w : W y v) and {r2} U Ca Fo (w’: W y v) then if 
the following conditions hold, we have D’ + w and D” F w’: 


1. there exists a D’ € ¢ (C1) that is isomorphic to a D” € f (C2), that is, there 
exists f : D' + D" that is a à-preserving and <*-preserving bijection, 
2. there is no event e in D’ such that rı <* e 


The example of Section 1 illustrates the application of condition (1) of co- 
product. Recall the event structures of (2B ,) and (LB3) respectively. 


(Rao al 
3 [l v D 


(W y 0] [wy 1] 


In each case, the event structure is built as the coproduct of the conflicting 
events. In (LBs), prior to applying coproduct we have {a} b and {c} F d. The 
writes have the same label for both read values so, taking C1 and C2 to be empty, 
coproduct makes them independent, adding the independent writes F b and F d. 
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In contrast, the values of writes 3 and 5 differ in (LB), so the coproduct has 
{2} + 3 and {4} F 5. When ultimately frozen, the justifications of (LB,) will 
produce the dependency edges (2,3) and (4,5) as described in Section 1. 

As for condition (2), if there is an event in the justification set that is ordered 
in <* with the respective top read, then the top read cannot be erased from the 
justification. Doing so would break the <* link. 

When having value sets that contain more than two values, we use }),,<y to 
denote a simultaneous coproduct (rather than the infinite sum). More precisely, 
if we coproduct the event structures Eo, E,,--- , En in a pairwise fashion as 
follows, 

Gry, 


we would get liftings that are undesirable. To see this, it suffices to consider the 
program, 
if (r==3) {x := 2} {x :=1} 


where the write to x of 1 is independent for a coproduct over values 1 and 2, but 
not when considering the event structure following (R < 3). 


4.3 Lock semantics 


When prepending a lock, we order the lock before following events in < and we 
freeze the justifications into dependencies. By freezing, we prevent justifications 
from events after the lock from interacting with newly appended events. This 
disables optimisations across the lock, e.g. store and load forwarding. 

We define the semantics of locks as follows, 


(L: L) e (E1,F1, 91, <1) = (L: L) e E,,0, S, <) (4) 


where <* remains unchanged and (E%, 0, S1, <1) = freeze(Ey,+1, $1, <1), where 


S contains all partial executions of the form, 
(X U {1}, (LKx U LK), DPx, RFx) 


where X € S{ and the lock order LK is such that for all lock or unlock event 
V € X, 1# V. Finally, <" is <! į extended with the lock ordered before all 
events in F4. 

The semantics for the unlock is similar. 


4.4 Parallel composition 


We define the parallel semantics as follows. Note that this operation freezes the 
constituent denotations before combining them, erasing their respective justifi- 
cation relations. This choice prevents the optimisation of dependencies across 
forks and it makes thread inlining optimisations unsound, as they are in the 
Promising Semantics [16] and the Java memory model [21]. 


(E1, S1,41, <1) x (E2, S2, F2, <2) = (E1 x E2, $,0, <1 U <2) 
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where, S are all coherent partial executions of the form, 
(X1 U Xo, (LKx, U LKx, U LK), (DPx, U DP xz); (RFx, U RF xX, U RFe)) 
where X; € SF, Xa € Sf and 


— freeze(E1, S1, F1, <1) = (Fi, SE, 0, <1) 
— freeze(E2, S2, F2, <2) = (Eo, SÈ, 0, <2) 


Furthermore, LK is constrained so that (LKx, ULKx, U LK) is a total order over 
the lock/unlock operations such that no lock/unlock operation is introduced 
between a lock and the next unlock on the same thread. Finally, we add all 
(w: W x v) ŽS (r : R z v) edges such that the execution satisfies condition 
(1.1) of coherence! and such that w belongs to $/ and r belongs to S$ or vice 
versa. 


4.5 Join Semantics 


We define the join composition as follows: 
(E1, 91,1, <1) * (E2, S2, F2, <2) Ê (Ei * Eo, 5,1, <) (5) 
where < is built as in the read rule and S are all executions of the form 
(Xı U Xo, (LKx, ULKx, U LK), (DP x, UDPx,), (RFx, U RFx, U RFi)) 


where X1 € S; and Xə € S with Xı and Xə conflict-free. Lock order LK orders 
all lock/unlock of X1 before all lock/unlock of Xz and w =“ r whenever w € Xj 
and r € Xə such that the execution is still coherent. 


5 Language and Semantics 


We consider an imperative language that has sequential and parallel composition, 
and mutable shared memory. 


Definition 2 (Language). 


B:=M=M|BAB|BVB|-B M:=n|x 
P::=skip|r:=x|x:=M|P,; P| Pi || Po |if(B{P}{P} 
| while(B){P} | L | U 


We have standard boolean expressions, B, and expressions, M, represented by 
natural numbers, n, or registers, r. Finally we have the set of command state- 
ments, P, where skip is the command that performs no action, r := x reads 
from a global variable and stores the value in r, x := M computes the expression 
M and stores its value to the global variable x, Pı ; P> is sequential composition, 


' Note that condition (1.2) does not need to be checked. 
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and P || P2 is parallel composition. We have standard conditional statements, 
while loops, locks and unlocks. Moreover, a program P is lock-well-formed® if on 
every thread, every lock is paired with a following unlock instruction and vice 
versa, and there is no lock or unlock operation between pairs. 

A register environment, R — V, is a function from the set of local registers, R, 
to the set of values, V. A continuation is a function taking a register environment, 
R — V, to an event structure, E. We write @ as a short-hand for \p.@, the 
continuation returning the empty event structure. 

We interpret the syntax defined above into the semantic domain defined in 
Section 4. In Figure 1, we define [-] as a function which takes a step-index n, 
a register environment p, and a continuation «, and returns a coherent event 
structure. 

The interpretation function |[-] is defined first by induction on the step-index 
and then by induction on the syntax of the program. When n = 1 the inter- 
pretation gives the empty event structure (undefined). Otherwise we proceed by 
induction on the structure of the program. skip is just the continuation applied 
to the environment. A read is interpreted as a set of conflicting read events for 
each value v attached with a continuation applied to the environment where the 
register is updated with v. 

A write is interpreted as a write with a following continuation. We interpret 
sequencing by interpreting the second program and passing it on to the interpre- 
tation of the first as a continuation. Parallel composition is the interpretation 
of the two programs with empty continuations passed to the x operator. The 
conditional statement is interpreted as usual. For interpreting the while-loops 
we use the induction hypothesis on the step-index [9]. 

When parallel composing two threads, we want to forbid any reordering with 
events sequenced before or after the composition (as thread inlining would do). 
To forbid this local reordering we surround this composition with two lock-unlock 
pairs. 


5.1 Compositionality 
We define the language of contexts inductively in the standard way. 


Definition 3 (Context). 


C :=[-]| P; C |C; P| CP) | elo 
| if (B) {CHP} | if (B) {PHC} | while (B) {C} 


In the base case, the context is a hole, denoted by [—]. The inductive cases follow 
the structure of the program syntax. In particular, a context can be a program 
P in sequence with a context, a context in sequence with a program P and so 
on. For a context C we denote C[P] by the inductively defined function on the 
context C that substitutes the program P in every hole. 


5 Jeffrey and Riely [15] adopt the same restriction. We conjecture that modelling 
blocking locks [4] would not affect the DRF-SC property. 
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[Pip =6 
[skip]n p x = (0) 
[r := x]n px = Xvev (R z v o k(pfr => v])) 
[x := M]n px = (W z [M],) è «(p) 
[Pis Poln os = [Pi]n o Ao1P2]n pr) 


[Lln px = (Le E1,F1) 
where (E1,F1) = K(p) 
[U]n on = (U o Fi,F1) 
where (£1,'1) = K(p) 


[Pi || Poln px = [L; Un p s’ 
where & = (Ap.([Pi}n o 0) x ([Paln pa) * (IL; Un p «)) 
[Pilns [B] = T 
[PoJno. [B] =F 
le px [Bl =T 
[skip]n p x [BI], = F 


[if (B){Pi}{P2}]n o n = l 


[while(B){P}]n pn = 


Fig. 1: Semantic interpretation 


The following lemma shows that the semantics preserve context application. 
This falls out from the fact that the semantic interpretation is compositional, 
that is, we define every constructor in terms of its subcomponents. 


Lemma 1 (Compositionality). For all programs Pı, Pz, if [Pi] = [P2] then 
for all contexts C, [C[P,|]] = [C[Pal]. 


The proof is a straightforward induction on the context C and it follows from the 
fact that semantics is inductively defined on the program syntax. The attentive 
reader may note that to prove |P] = [P2] in the first place we have to assume n, 
p and « and prove [Pin ps = [P2ln p x- It is customary however in denotational 
semantics to have programs denoted by functions that are equal if they are equal 
at all inputs [31]. 


5.2 Data Race Freedom 


Data race freedom ensures that we forbid optimisations which could lead to 
unexpected behaviour even in the absence of data races. We first define the 
closed semantics for a program P. For all n, the semantics of P, namely [P] 
is [Init(P)]n vx.09, where Init(P) is the program that takes the global vari- 
ables in P and initialises them to 0. We now establish that race-free programs 
interpreted in the closed semantics have sequentially consistent behaviour. 


DRF semantics. Rather than proving DRF-SC directly, we prove that race-free 
programs behave according to an intermediate semantics (|). This semantics 
differs from [-] in only two ways: program order is used in the calculation of 
coherence instead of preserved program order, and no dependency edges are 
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recorded (as these are subsumed by program order). More precisely, the seman- 
tics is calculated as in Figure 1 but we check that (RF. U LK U LE) is acyclic. 
Note that race-free executions of the intermediate semantics (-) satisfy the 
constraints of the model of Boehm and Adve [10], and the definition of race is 
the same between the two models. Boehm and Adve prove that in the absence 
of races, their model provides sequential consistency. 
The DRF-SC theorem is stated as follows. 


Theorem 1. For any program P, if (P) is data race free then every execution 
D in [P] is a sequentially consistent execution, i.e. D is in (P). 


6 Tests and Examples 


In this section, four examples demonstrate aspects of the semantics: the first 
recognises a false dependency, the second forbids unintended behaviour allowed 
by Jeffrey and Riely [15], the third motivates the choice to add forwarded writes 
to justification, and the last shows how we support an optimisation forbidden 
by Java but performed by the Hotspot compiler. 


6.1 LB-+ctrl-double 


In the first example, from Batty et al. [7], the compiler collapses conditionals to 
transform P} to P. 


Pi P, 
Vy :=X; 
if (r1==1) { 
y:=1 rı := X; 
} else { amor := 1 
y:=1 


} 


Coproduct ensures that the denotations of P) and P> are identical, with the 
event structure above, together with justification F b and F d. From composi- 
tionality (Lemma 1) and equality of the denotations, we have equal behaviour 
of P) and P, in any context, and the optimisation is allowed. 


6.2 Jeffrey and Riely’s TC7 


The next test is Java TC7. The outcome where r1, ro and r3 all have value 1 is 
forbidden by Jeffrey and Riely [15, Section 7], but allowed in the Java Causality 
Test Cases [27]. 


Tı To 
Yj 23 r3 :=y; 
(TC?) 
Yo := X; Z :=T3; 
xi=l 
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As noted by Jeffrey and Riely [15], the failure of this test “indicates a failure to 
validate the reordering of independent reads”. 


al | 


5 9 10 
[W y 0] [Wy 1 [W y 0] [Wy 1] 


In the event structure of T} above, the justification relation is constructed ac- 
cording to Section 5. In particular, the rule for prepending reads (equation (4.1)) 
gives us {1,2} Fr, 4 and {1,3} 7, 5 on the left-hand side, and {6,7} Fr, 9 and 
{6,8} Fr, 10 on the right. When composing the left and right sides, the co- 
product rule (Section 4.2) makes four independent links, namely, {2} Fr, 4, 
{3} Fr, 5, {7} Fr, 9, and {8} Fr, 10. This is because, at the top level, for 
both branches, we can choose a write with the same label that is dependent on 
the same reads (plus the top ones on z). More precisely, on the left-hand side 
Cı = {1,2} is such that C Fr, 4, and on the right-hand side C2 = {6,7} is such 
that C2 Fr, 9. When the top events, 1 and 6 respectively, are removed, these 
contexts become isomorphic (C1[1] = C2[6]). Hence, {2} Fr, 4 and {7} Fr, 9, 
and {3} Fr, 5 and {8} 7, 10. 
Now consider the event structure for the thread T». 
Here we have two independent writes, namely Fp, (15: 
L 2 W g 1)and Fr, (16: W z 1), arising in the coproduct 
R y 0R y1) eine ie {11} Fr, ee : W z 1) and {12} Fr, 


13 14 (16: W z 1). Notice that by definition (3), we do not 
Wz 0| (W z 1 add the writes 13 and 14 to the justification sets of 
i any W z 1, and because they write different values to z 


depending on the value of y, we have the dependencies 
{11} Fr, 13 and {12} Fr, 14. 

When parallel composing, we connect the RF-edges 
that respect coherence. Thus we obtain the execution 
{16 m, 8 +5 10 5 12 1S 6}, which is coherent, allowing the outcome 
with r1, ro and rg all 1 as desired. 


5 16 
Wz1l [Ww x 1] 


6.3 Adding writes to justifications 


In the definition of prepending writes (equation (3), condition (2)) we state that 
for any given justification, if there is an event in the justification set that is 
related via <* with the write we are prepending, then that write must be in the 
justification set as well. 

To see why we made this choice consider the following program, 
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x i= i 
rı :=y; 
if (r,==0) { o. 
x :=0; ro :=x; if (rə==1){z := 1} a ae 
} else { 
r3 :=x; if (rg==1) {z := 1} 
} 


and its associated event structure, 


ow 
3 4 Ñ 12 
Wro) [R20OMRz1| Wal 


9 


Weil 


We focus on the interpretation of the left-hand side thread. In the equation 
(3), because {7} + 9 and 3 <* 7, the event (3 : W x 0) gets inserted in the 
justification set, leading to the justification {3,7} F 9. On the other branch, 
up until the coproduct of the read on y, we have {5} F 8. At this point, the 
justifications {7} = 9 and {5} 8 are not lifted because 9 requires 3 as well. 
Event 3 may not be removed because of the condition in the write prepending 
rule. Without this condition 3 would not be necessary to justify 9, yielding the 
lifting of the link {5} + 8. This would also cause the execution {0 #5 5 —> 
st 1 = 2} to be coherent due to the lack of a dependency between 
2 and 5. 

This execution is not sequentially consistent, but under SC, the program is 
race free. Without writes in justifications, the model would violate the DRF-SC 
property described in Section 5.2. 


6.4 Java memory model, Hotspot. 


Finally, we discuss redundant read after read elimination, an optimisation per- 
formed by the Hotspot compiler but forbidden by the Java memory model. It 
is the first optimisation in the following sequence from Ševčík and Aspinall [30, 
Figure 5], used to demonstrate that the Java memory model is too strict, and 
unsound with respect to the observable behaviour of Sun’s Hotspot compiler. 
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T3 Tə Tı 
T3 Sy; 
if (rə3== 1) 
r2 :=y; xi=l; 
{r3 :=y; x := r3} — — 
xi=l; Yo :=y; 
else 


{x :=1} 


Consider the event structures of the unoptimised T3 and optimised T}. 


g | 7 


(wzo) (Wel) 


The optimisation removes the apparently redundant pair of reads (4, 6), then 
reorders the now-independent write. This redundancy is represented in justifi- 
cation: when prepending the top read of y to the right-hand side of the event 
structure, the existing justification 6 F 7 is replaced by 3 + 7. When coproduct is 
applied, this matches with justification 1 2, leading to the independent writes 
F 2 and F 7. In a weak memory context however, a parallel thread could write a 
value to y between the two reads, thereby changing the value written to x. For 
this reason, we keep event 4 in the denotation and create the dependency edge 
4 => 5. 

Despite exhibiting the same behaviour here, the denotations of T3 and To do 
not match. We establish that the optimisation is sound in any context in the 
next section. 


7 Refinement 


We have shown in Section 5.1 that our semantics enjoys a compositionality 
property: if we can prove that two programs have the same semantics (w.r.t 
set-theoretical equality) then they cannot be distinguished by any context. We 
also explained how equality is too strict, as it does not allow us to relate all 
programs that ought to be deemed semantically equivalent. Our Java Hotspot 
compiler example in Section 6 shows that the program T; is in practice optimised 
to Ty and then to Tı. However, it is clearly not true that [Ti]n px is a subset of 
[Ta] n p s- 

In this section we present a coarser-grained relation, which we call refinement 
(=<). This relation permits the optimisations we want, but remains sound w.r.t. 
the intuitive notion of observational equivalence, and that it is closed under 
context application in the same way as equality. 
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To show soundness we define observational refinement (oss) which cap- 
tures the intuitive notion of program equivalence: one program is a permissible 
optimisation of another if it does not increase the set of observable behaviours, 
defined here as changes to values of observed variables. The definition identifies 
related executions and compares the ordering of observable events, recognising 
that adding happens-before edges restricts behaviour. We then define a refine- 
ment relation and show this relation is a subset of observational refinement. This 
is formally stated in the following lemma: 


Lemma 2 (Soundness of Refinement (<Cops)). For all P) and P>, if 
[Pil po 3 [Poli po then [Pi]; po Sots [Polit p0 


Note that the refinement relation is defined over a tweaked version of the 
semantics, [-]”, a variant of [-] in which the registers are explicit in the event 
structure. 

Finally we show = is compositional: 


Theorem 2 (Compositionality of Refinement (=)). For all programs P, 
and Pz, and indexes n, if for all p, [Plt , gx [Pal , g then for all contexts C, 


p, k and K’ such that k < K' we have that [C[P,JJ2, . < [CIPA]? 


npr npr 


8 Showing implementability via IMM 


In this section we show that our calculation of relaxed dependencies can easily be 
reused to solve the thin-air problem in other state-of-the-art axiomatic models, 
drawing the advantages of these models over to ours. In particular, we augment 
the IMM and RC11 models of Podkopaev et al. [26]. We adopt their language, 
given below. It covers C++ atomics, fences, fetch-and-add and compare-and- 
swap operations but excludes locks. Note that locks are implementable using 
compare and swap operations. 


M:=n|r 
P= Ty |- || Tr 
B:=M=M|BAB|BVB|-B 
: OR = rle | acq 
T := skip|r :=°8 x|x:=°% M|T; To ae 
ow = rir | re 
| if (B){P,}{ Po} | while(B){P} a 
E Sð op ::= acq | rel | acqrel | sc 
| fence™ | r := FADD®:°Y (x, M) 
MW ORMW = normal | strong 


| CAS: (x, M, M) 


ORMW 


First we provide a model, written (for a program P) as [P]mrp+imm, that 
combines our relaxed dependencies to the axiomatic model of IMM , here written 
as [P]imm. We will make these definitions precise shortly. We then show that 
[P]urp+imm is weaker than |P]imm, making [P] mep+imm implementable over 
hardware architectures like x86-TSO, ARMv7, ARMv8 and Power. Secondly, we 
relax the RC11 axiomatic model by using our relaxed dependencies model MRD 
to create a new model [P]mrp-ci1, and show this model weaker than the RC11 
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model. We argue that the mathematical description of [P]mrp-ci1 is lightweight 
and close to the C++ standard, it would therefore require minimal work to 
augment the standard with the ideas presented in this paper. 

To prove implementability over hardware architectures we define a pre-execution 
semantics, where the relaxed dependency relation DP is calculated along with the 
data and control dependencies from IMM . To combine our model with IMM , 
we redefine the AR relation (we refer the reader to the IMM paper [26] for the 
details on AR) such that it is parametrised by an arbitrary relation which we put 
in place of the relations (data U ctrl). AR(data U ctrl) equals the original axiom 
AR and AR(DP) is the same axiom where DP is put in place of data U ctrl. 

We define the executions in [P]mrp+imm as the maximal conflict-free sets 
such that AR(DP) is acyclic, and executions in [P]imm as the maximal conflict- 
free sets such that AR(data U ctrl) is acyclic. 


8.1 Implementability 


We can now state and prove that the MRD model is implementable over IMM, 
which gives us that MRD is implementable over x86-TSO, ARMv7, ARMv8, 
Power and RISC-V by combining our result with the implementability result of 
IMM . 


Theorem 3 (MRD+IMM is weaker than IMM ). For all programs P by the 
IMM model, 
[P] mro+imm 2 [P]imm 


9 Modular Relaxed Dependencies in RC11: MRD-C11 


We refer to the RC11 [18] model, as specified in Podkopaev et al. [26]. We call this 
model |P]rc11. While [P] rcii forbids thin-air executions, it is not weak enough: 
it forbids common compiler optimisations by imposing that (E U RF) is acyclic. 
We relax this condition by similarly replacing E with our relaxed dependency 
relation DP, this time calculated on our preserved program order relation (<). 
We call this model [P]mro-c11. Mathematically, this is done by imposing that 
(DP URF) is acyclic. 
At this point, we prove the following lemma: 


Lemma 3 (Implementability of MRD-C11). For all programs P, 


[Plmro-c11 > [P]rRc1 


To show this it suffices to show that there always exists DP C C. This is straight- 
forward by induction on the structure of P, observing that the only place where 
dependencies go against E is when hoisting a write in the coproduct case. How- 
ever, in the same construction we always preserve the dependencies coming from 
the different branches of the structure which are, by inductive hypothesis, always 
agreeing with program order. 
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9.1 MRD-C11 is DRF-SC 


We show that MRD-C11 validates the DRF-SC theorem of the C++ standard [13, 
§6.8.2.1 paragraph 20]. 


Theorem 4 (MRD-C11 is DRF-SC). For a program whose atomic accesses 
are all SC-ordered, if there are no SC-consistent executions with a race over 


non-atomics, then the outcomes of P under MRD-C11 coincide with those under 
SC. 


Sketch proof. In the absence of races and relaxed atomics, the no-thin-air guar- 
antee of RC11 is made redundant by the guarantee of happens-before acyclicity 
shared by RC11 and MRD-C11. The result follows from this observation, lemma 3 
and Theorem 4 from Lahav et al. [18]. 


10 On the Promising Semantics and WEAKESTMO 


In this section we present examples that differentiate the Promising Semantics 
and WEAKESTMO from our MRD and MRD-C11 models. 

First, we show that MRD correctly forbids the out-of-thin-air behaviour in the 
litmus test Coh-CYC from Chakraborty and Vafeiadis [11]. The test, given below, 
differentiates Promising and WEAKESTMO: only the latter avoids the outcome 
Yr, =3, re =2andr3=1. 


9 xi=l; 
x :=2; 

ro :=x; \\ 2 
7 as ce | r3 :=y;\\1 


i l= ‘= 
EE if (r3!= 0) {x := 3} 

MRD correctly forbids this outcome: it identifies a dependency on the left- 
hand thread from the read of 3 from x to the write y := 1, and on the right-hand 
thread from the read of 1 from y to the write x := 3. The desired outcome then 
has a cycle in dependency and reads-from, and it is forbidden. 

Chakraborty and Vafeiadis ascribe the behaviour to “a violation of coherence 
or a circular dependency”, and include specific machinery to WEAKESTMO that 
checks for global coherence violations at each step of program execution. These 
global checks forbid the unwanted outcome. 

The Promising Semantics, on the other hand, can make promises that are not 
sensitive to coherence order, and therefore allows the above outcome erroneously. 

In Coh-CYC, enforcing coherence ordering at each step in WEAKESTMO was 
enough to forbid the thin-air behaviour, but it is not adequate in all cases. The 
example below features an outcome that Promising and WEAKESTMO allow, and 
that MRD-C11 and MRD forbid. It demonstrates that cycles in dependency can 
arise without violating coherence in WEAKESTMO. 


z:=1 || y:=x || if(zt= 0){x :=1}{ro :=y; x :=rọ;a:=ro} 
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The program is an adaptation® of a Java test, where the the unwanted out- 
come represents a violation of type safety [20]. Observing the thin-air behaviour 
where a = 1 in the adaptation above is the analogue of the unwanted outcome in 
the original test. If in the end a = 1, then the second branch of the conditional 
in the rightmost thread must execute. It contains a read of 1 from y, and a 
dependent write of x := 1. On the middle thread there is a read of 1 from x, and 
a dependent write of y := 1. These dependencies form the archetypal thin-air 
shape in the execution where a = 1. MRD correctly identifies these dependencies 
and the outcome is prohibited due to its cycle in reads-from and dependency. 

The a = 1 outcome is allowed in the Promising Semantics: a promise can be 
validated against the write of x := 1 in the true branch of the righthand thread, 
and later switched to a validation with x := ro from the false branch, ignoring 
the dependency on the read of y. 

In the previous example, Coh-CYC, a stepwise global coherence check caused 
WEAKESTMO to forbid the unwanted behaviour allowed by Promising, but that 
machinery does not apply here. WEAKESTMO allows the unwanted outcome, and 
we conjecture that this deficiency stems from the structure of the model. De- 
pendencies are not represented as a relation at the level of the global axiomatic 
constraint, so one cannot check that they are consistent with the dynamic exe- 
cution of memory, as represented by the other relations. Adopting a coherence 
check in the stepwise generation of the event structure mitigates this concern for 
Coh-CYC, but not for the test above. 

In contrast, MRD does represent dependencies as a relation, allowing us to 
check consistency with the RF relation here. The axiom that requires acyclicity 
of (DP U RF) forbids the unwanted outcome, as desired. 


11 Evaluating MRD-C11 with the MRD-er tool 


MRD-C11 is the first weak memory model to solve the thin-air problem for C++ 
atomics that has a tool for automatically evaluating litmus tests. Our tool, MRD- 
er, evaluates litmus tests under the base model, RC11 augmented with MRD, and 
IMM augmented with MRD. It has been used to check the result of every litmus 
test in this paper, together with many tests from the literature, including the 
Java Causality Test cases [7,11,15,16,18,25,26,27]. 

When evaluating whether a particular execution is allowed for a given test, a 
model that solves the thin-air problem must take other executions of the program 
into account. For example, the semantics of Pichon-Pharabod et al., having 
explored one execution path, may ultimately backtrack [25]. Jeffrey and Riely 
phrase their semantics as a two player game where at each turn, the player 
explores all forward executions of the program [15]. At each operational step, the 
Promising Semantics [16] has to run forwards in a limited local way to validate 


6 James Riely, Alan Jeffrey and Radha Jagadeesan provided the precise example pre- 
sented here [28]. It is based on Fig. 8 of Lochbihler [20], and its problematic execution 
under Promising was confirmed with the authors of Promising. 
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that promised writes will be reached. The invisible events of Chakraborty et 
al. [11] are used to similar effect. 

In MRD-C11, it is the calculation of justification that draws in information 
from other executions. This mechanism is localised, it avoids making choices 
about the execution that prune behaviours, and it does not require backtracking. 
MRD-C11 acts in a “bottom-up” fashion, and modularity ensures that justifica- 
tions drawn from the continuation need not be recalculated. These properties 
have supported the development of MRD-er: automation of the model requires 
only a single pass across the program text to construct the denotation. 


12 Discussion 


Four recent papers have presented models that forbid thin-air values and permit 
previously challenging compiler optimisations. The key insight from these papers 
is that it is necessary to consider multiple program executions simultaneously. 
To do this, three of the four [15,25,11] use event structures, while the Promising 
Semantics [16] is a small-step operational semantics that explores future traces 
in order to take a step. 

Although the Promising Semantics [16] is quite different from MRD, its mech- 
anism for promising focuses on future writes, and MRD has parallels in its cal- 
culation of independent writes. Note also that both Promising’s certification 
mechanism and MRD’s lifting are thread-local. 

The previous event-structure-based models are superficially similar to MRD, 
but all have a fundamentally different approach from ours: Pichon-Pharabod and 
Sewell [25] use event structures as the state of a rewriting system; Jeffrey and 
Riely [14,15] build whole-program event structures and then use a global mech- 
anism to determine which executions are allowed; and Chakraborty et al. [11] 
transform an event structure using an operational semantics. In contrast, we fol- 
low a more traditional approach [33] where our event structures are used as the 
co-domain of a denotational semantics. Further, Jeffrey and Riely [14,15] and 
Pichon-Pharabod and Sewell [25] do not cover a significant subset set of C++ 
relaxed concurrency primitives. 

MRD does not suffer from known problems with existing models. As noted 
by Kang et al. [16], the Pichon-Pharabod and Sewell model produces behaviour 
incompatible with the ARM architecture. The Jeffrey and Riely model forbids 
the reordering of independent reads, as demonstrated by Java Causality Test 7 
(see Section 6.2). The Promising semantics allows the cyclic coherence ordering 
of the problematic Coh-CYC example [11]. WEAKESTMO allows the thin-air out- 
come in the Java-inspired test of Section 10. In all four cases MRD provides the 
correct behaviour. 

MRD is also highly compatible with the existing C++ standard text. The 
DP relation generated by MRD can be used directly in the axiomatic model to 
forbid thin-air behaviour. We are working on standards text with the ISO C++ 
committee based on this work, and have a current working paper with them [5]. 
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The notion in C++ that data-race free programs should not exhibit observ- 
able weak behaviours goes back to Adve and Hill [1], and formed the basis of 
the original proposal for C++ [10]. This was formalised by Batty et al. [8] and 
adopted into the ISO standard. Despite the pervasiveness of DRF-SC theorems 
for weak memory models, these have remained whole-program theorems that 
do not support breaking a program into separate DRF and racy components. 
Our DRF theorem for our denotational model demonstrates a limited form of 
modularity that merits further exploration. 

Other denotational approaches to relaxed concurrency have not tackled the 
thin-air problem. Dodds et al. [12] build a denotational model based on an 
axiomatic model similar to C++. It forms the basis of a sound refinement relation 
and is used to validate data-structures and optimisations. Their context language 
is too restrictive to support a compositional semantics, and their compromise 
to disallow thin-air executions forbids important optimisations. Kavanagh and 
Brookes [17] provide a denotational account of TSO concurrency, but their model 
is based on pomsets and suffers from the same limitation as axiomatic models [7]: 
it cannot be made to recognise false dependencies. 


Future Work. We envisage a generalised theorem that would, on augmentation 
with MRD, extend an axiomatic DRF-SC proof to a proof that applies to the 
augmented model. 

The ISO have struggled to define memory_order: : consume [13]. It is intended 
to provide ordering through dependencies that the compiler will not optimise 
away. The semantic dependency relation calculated by MRD identifies just these 
dependencies, and may support a better definition. 

Finally, where we have used a global semantics to provide a full C++ model, 
it would be interesting to extend the denotational semantics to also cover all of 
C++, thereby allowing reasoning about C++ code in isolation from its context. 


13 Conclusions 


We have used the relatively recent insight that to avoid thin-air problems, a 
semantics should consider some information about what might happen in other 
program executions. We codify that into a modular notation of justification, 
leading to a semantic notion of independent writes, and finally of dependency 
(bP). We demonstrate the effectiveness of these concepts in three ways. One, 
we define a denotational semantics for a weak memory model, show it supports 
DRE-SC, and build a compositional refinement relation strong enough to verify 
difficult optimisations. Two, we show how to use DP with other axiomatic models, 
supporting the first optimal implementability proof for a thin-air solution via 
IMM , and showing how to repair the ISO C++ model. Three, we build a tool 
for executing litmus tests allowing us to check a large number of examples. 
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Abstract. Computing relies on architecture specifications to decouple 
hardware and software development. Historically these have been prose 
documents, with all the problems that entails, but research over the 
last ten years has developed rigorous and executable-as-test-oracle spec- 
ifications of mainstream architecture instruction sets and “user-mode” 
concurrency, clarifying architectures and bringing them into the scope of 
programming-language semantics and verification. However, the system 
semantics, of instruction-fetch and cache maintenance, exceptions and 
interrupts, and address translation, remains obscure, leaving us without 
a solid foundation for verification of security-critical systems software. 


In this paper we establish a robust model for one aspect of system se- 
mantics: instruction fetch and cache maintenance for ARMv8-A. Sys- 
tems code relies on executing instructions that were written by data 
writes, e.g. in program loading, dynamic linking, JIT compilation, de- 
bugging, and OS configuration, but hardware implementations are often 
highly optimised, e.g. with instruction caches, linefill buffers, out-of-order 
fetching, branch prediction, and instruction prefetching, which can affect 
programmer-observable behaviour. It is essential, both for programming 
and verification, to abstract from such microarchitectural details as much 
as possible, but no more. We explore the key architecture design ques- 
tions with a series of examples, discussed in detail with senior Arm staff; 
capture the architectural intent in operational and axiomatic seman- 
tic models, extending previous work on “user-mode” concurrency; make 
these models executable as test oracles for small examples; and experi- 
mentally validate them against hardware behaviour (finding a bug in one 
hardware device). We thereby bring these subtle issues into the mathe- 
matical domain, clarifying the architecture and enabling future work on 
system software verification. 


1 Introduction 


Computing relies on the architectural abstraction: the specification of an en- 
velope of allowed hardware behaviour that hardware implementations should 
lie within, and that software should assume. These interfaces, defined by hard- 
ware vendors and relatively stable over time, notionally decouple hardware and 
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software development; they are also, in principle, the foundation for software ver- 
ification. In practice, however, industrial architectures have accumulated great 
complexity and subtlety: the ARMv8-A and Intel architecture reference manuals 
are now 7476 and 4922 pages [9,26], and hardware optimisations, including out- 
of-order and speculative execution, result in surprising and poorly-understood 
programmer-observable behaviour. Architecture specifications have historically 
also been entirely informal, describing these complex envelopes of allowed be- 
haviour solely in prose and pseudocode. This is problematic in many ways: do not 
serve as clear documentation, with the inevitable ambiguity and incompleteness 
of informal prose leaving major questions unanswered; without a specification 
that is executable as a test oracle (that can decide whether some observed be- 
haviour is allowed or not), hardware validation relies on test suites that must be 
manually curated; without an architecturally-complete emulator (that can ex- 
hibit all allowed behaviour), it is very hard for software developers to “program to 
the specification” — they rely on test-and-debug development, and can only test 
above the hardware implementation(s) they have; and without a mathematically 
rigorous semantics, formal verification of hardware or software is impossible. 


Over the last 10 years, much has been done to put architecture specifications 
on a more rigorous footing, so that a single specification can serve all those 
purposes. There are three main problems, two of which are now largely solved. 


The first is the instruction-set architecture (ISA): the specification of the 
sequential behaviour of individual instructions. This is chiefly a problem of scale: 
modern industrial architectures such as Arm or x86 have large instruction sets, 
and each instruction involves many details, including its behaviour at different 
privilege levels, virtual-to-physical address translation, and so on — a single Arm 
instruction might involve hundreds of auxiliary functions. Recent work by Reid 
et al. within Arm [40,41,42] transitioned their internal ISA description into a 
mechanised form, used both for documentation and testing, and with him we 
automatically translated this into publicly available Sail definitions and thence 
into theorem-prover definitions [11,10]. Other related work is in §7. 


The second is the relaxed-memory concurrent behaviour of “user-mode” op- 
erations: memory writes and reads, and the mechanisms that architectures pro- 
vide to enforce ordering and atomicity (dependencies, memory barriers, load- 
linked/store-conditional operations, etc.). In 2008, for ARMv7, IBM POWER, 
and x86, this was poorly understood, and the architects regarded even their own 
prose specifications as inscrutable. Now, following extensive work by many peo- 
ple [36,37,19,18,22,8,31,45,7,46,48,35,6,2,47,13,1], ARMv8-A has a well-defined 
and simplified model as part of its specification [9, B2.3], including a prose 
transcription of a mathematical model [15], and an equivalence proof between 
operational and axiomatic presentations [36,37]; RISC-V has adopted a similar 
model [52]; and IBM POWER and x86 have well-established de-facto-standard 
models. All of these are experimentally validated against hardware, and sup- 
ported by tools for exhaustively running tests [17,4]. The combination of these 
models and the ISA semantics above is enough to let one reason about or model- 
check concurrent algorithms. 
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That leaves the third part of the problem: the “system” semantics, of 
instruction-fetch and cache maintenance, exceptions and interrupts, and ad- 
dress translation and TLB (translation lookaside buffer) maintenance. Just as 
for “user-mode” relaxed memory, these are all areas where microarchitectural op- 
timisations can have surprising programmer-visible effects, especially in the con- 
current context. The mechanisms are relied on by all code, but they are explicitly 
managed only by systems code, in just-in-time (JIT) compilers, dynamic loaders, 
operating-system (OS) kernels, and hypervisors. This is, of course, exactly the 
security-critical computing base, currently trusted but not trustworthy, that is 
especially in need of verification — which requires a precise and well-validated 
definition of the architectural abstraction. Previous work has scarcely touched 
on this: none of seL4 [27], CertiKOS [24,23], Komodo [16], or [25,12], address 
realistic architecture concurrency, and they use (at best) idealised models of the 
sequential systems architecture. The CakeML [51,28] and CompCert [29] verified 
compilers target only sequential user-mode ISA fragments. 


In this paper we focus on one aspect of system semantics: instruction fetch 
and cache maintenance, for ARMv8-A. The ability to execute code that has 
previously been written to data memory is fundamental to computing: fine- 
grained selfmodifying code is now rare, and (rightly) deprecated, but program 
loading, dynamic linking, JIT compilation, debugging, and OS configuration all 
rely on executing code from data writes. However, because these are relatively 
infrequent operations, hardware designers have been able to optimise by partially 
separating the instruction and data paths, e.g. with distinct instruction caching, 
which by default may not be coherent with data accesses. This can introduce 
programmer-visible behaviour analogous to that of user-mode relaxed-memory 
concurrency, and require specific additional synchronisation to correctly pick up 
code modifications. Exactly what these are is not entirely clear in the current 
ARMv8-A architecture text, just as pre-2018 user-mode concurrency was not. 


Our main contribution is to clarify this situation, developing precise abstrac- 
tions that bring the instruction-fetch part of ARMv8s-A system behaviour into 
the domain of rigorous semantics. Arm have stated [private communication] 
that they intend to incorporate a version of this into their architecture. We aim 
thereby to enable future work on system software verification using the tech- 
niques of programming languages research: program analysis, model-checking, 
program logics, etc. We begin (§2) by recalling the informal architectural guar- 
antees that Arm provide, and the ways in which real-world software systems 
such as Linux, JavaScript, and WebAssembly change instruction memory. Then: 


(1) We explore the fundamental phenomena and architecture de- 
sign questions with a series of examples (§3). We explore the interactions 
between instruction fetching, cache maintenance and the ‘usual’ relaxed mem- 
ory stores and loads, showing that instruction fetches are more relaxed, and 
how even fundamental coherence guarantees for data memory do not apply to 
instruction fetches. Most of these questions arose during the development of our 
models, in detailed ongoing discussion with the Arm Chief Architect and other 
Arm staff. They include questions of several different kinds. Six are clear from 
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the Arm prose specification. Of the others: two are not implied by the prose but 
are natural choices; five involved substantive new choices by Arm that had not 
previously been considered and/or documented; for two, either choice could be 
reasonable, and Arm chose the simpler (and weaker) option; and for one, Arm 
were independently already strengthening the architecture to accommodate ex- 
isting software. 


(2) We give an operational semantics for Arm instruction fetch 
and icache maintenance (§4). This is in an abstract-microarchitectural style 
that supports an operational intuition for how hardware actually works, while 
abstracting from the mass of detail and the microarchitectural variation of actual 
hardware implementations. We do so by extending the Flat model [37] with 
simple abstractions of instruction caches and the coherent data cache network, 
in a way that captures the architectural intent, defining the entire envelope of 
behaviours that implementations should be allowed to exhibit. 


(3) We give a more concise presentation of the model in an ax- 
iomatic style (§5), extending the “user-mode” axiomatic model from previous 
work [37,36,15,9], and intended to be functionally equivalent. We discuss how 
this too matches the architectural intent. 


(4) We validate all this in two ways: by the extensive discussion with 
Arm staff mentioned above, and by experimental testing of hardware behaviour, 
on a selection of ARMv8-A cores designed by multiple vendors (§6). We run 
tests on hardware with a mild extension of the Litmus tool [5,7]. We make the 
operational model executable as a test oracle by integrating it into the RMEM 
tool and its web interface [17], introducing optimisations that make it possible 
to exhaustively execute the examples. We make the axiomatic model executable 
as a test oracle with a new tool that takes litmus tests and uses a Sail [11] 
definition of a fragment of the ARMv8-A ISA to generate SMT problems for the 
model. We then compare hardware and the two models for the handwritten tests 
(modulo two tests not supported by the axiomatic checker), compare hardware 
and the operational model on a suite of 1456 tests, automatically generated 
with an extension of the diy tool [3], and check the operational and axiomatic 
models against sets of previous non-ifetch tests. In all this data our models are 
equivalent to each other and consistent with hardware observations, except for 
one case where our testing uncovered a hardware bug on a Qualcomm device. 


Finally, we discuss other related work (§7) and conclude (§8). We do all this 
for ARMv8-A, but other relaxed architectures, e.g. IBM POWER and RISC-V, 
face similar issues; our tests and tooling should enable corresponding work there. 

The models are too large to include or explain in full here, so we focus 
on explaining the motivating examples, the main intuition and style of the 
operational model, in a prose rendering of its executable mathematics, and 
the definition of the axiomatic model. Appendices provide additional exam- 
ples, a complete prose description of the operational model, and additional ex- 
planation of the axiomatic model. The complete executable mathematics ver- 
sion, the web-interface tool for running it, and our test results are at https: 
//www.cl.cam.ac.uk/~pes20/iflat/. 
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Caveats and Limitations Our executable models are integrated with a substan- 
tial fragment of the Sail ARMv8-A ISA (similar to that used for CakeML), but 
not yet with the full ISA model [11,40,41,42]; this is just a matter of additional 
engineering. We only handle the 64-bit AArch64 part of ARMv8-A, not AArch32. 
We do not handle the interaction between instruction fetch and mixed-size ac- 
cesses, or other variants of the cache maintenance instructions, e.g. those used for 
interaction with DMA engines, and variants by set or way instead of by virtual 
address. Finally, the equivalence between our operational and axiomatic models 
is validated experimentally. A proof of this equivalence is essential in the long 
term, but would be a major work in itself: the complexity makes mechanisation 
essential, but the operational model (in all its scale and complexity) has not yet 
been subject to mechanised proof. Without instruction fetch, a non-mechanised 
proof was the main result of an entire PhD thesis [36], and we expect the addition 
of instruction fetch to require global changes to the argument. 


2 Industry Practice and the Existing ARMv8-A Prose 


Computer architecture relies on a host of sophisticated techniques, including 
buffering, caching, prediction, and pipelining, for performance. For the normal 
memory reads and writes of “user-mode” concurrency, the programmer-visible 
relaxed-memory effects largely arise from store buffering and from out-of-order 
and speculative pipeline behaviour, not from the cache hierarchy (though some 
IBM POWER phenomena do arise from the interconnect, and from late process- 
ing of cache invalidates). All major architectures provide a strong per-location 
guarantee of coherence: for each memory location, different threads cannot ob- 
serve the writes to that location in different orders. This is implemented in 
hardware by coherent cache protocols, ensuring (roughly) that each cache line is 
writable by at most one hardware thread at a time, and by additional machinery 
restricting store buffer and pipeline behaviour. Then each architecture provides 
additional synchronisation mechanisms to let the programmer enforce ordering 
properties involving multiple locations. 

At first sight, one might expect instruction fetches to act like other memory 
reads but, because writes to instruction memory are relatively rare, hardware de- 
signers have adopted different caching mechanisms. The Arm architecture care- 
fully does not mandate exactly what these must be, to allow a wide range of 
possible hardware implementations, but, for example, a high-performance Arm 
processor might have per-core separate L1 instruction and data caches, above 
a unified per-core L2 cache and an L3 cache shared between cores. There may 
also be additional structures, e.g. per-core fetch queues, and caching of decoded 
micro-operations. This instruction caching is not necessarily coherent with data 
memory accesses: “the architecture does not require the hardware to ensure co- 
herency between instruction caches and memory” [9, B2.4.4 (B2-114)]; instead, 
programmers must use explicit cache maintenance instructions. The documenta- 
tion gives a particular sequence of these: “If software requires coherency between 
instruction execution and memory, it must manage this coherency using Context 
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synchronization events and cache maintenance instructions. The following code 
sequence can be used to allow a processing element (PE) to execute code that the 
same PE has written.” 


; Coherency example for data and instruction accesses [...] 

; Enter this code with <Wt> containing a new 32-bit instruction, 
; to be held in Cacheable space at a location pointed to by Xn. 
STR Wt, [Xn]; Store new instruction 

DC CVAU, Xn ; Clean data cache by virtual address (VA) to PoU 


DSB ISH ; Ensure visibility of the data cleaned from cache 
IC IVAU, Xn ; Invalidate instruction cache by VA to PoU 

DSB ISH ; Ensure completion of the invalidations 

ISB ; Synchronize the fetched instruction stream 


At first sight, this may be entirely mysterious. The remainder of the paper es- 
tablishes precise semantics for each instruction, explaining why each is required, 
but as a rough intuition: 


1. The DC CVAU, Xn cleans this core’s data cache for address Xn, pushing the new 
write far enough down the hierarchy for an instruction fetch that misses in 
the instruction cache to be guaranteed to see the new value. This point is the 
Point of Unification (PoU) and is usually the point where the instruction 
and data caches become unified (L2 for most modern devices). 

2. The DSB ISH waits for the clean to have happened before letting the later 
instructions execute (without this, the sequence itself can execute out-of- 
order, and the clean might not have pushed the write down far enough before 
the instruction cache is updated). The ISH makes this specific to the Inner 
Shareable Domain: the processor itself, not the system-on-chip. We do not 
model shareability domains in this paper, so this is equivalent to a DSB SY. 

3. The IC IVAU,Xn invalidates any entry for that address in the instruction 
caches for all cores, forcing any future fetch to miss in the instruction cache, 
and instead read the new value from the data memory hierarchy; it also 
touches some fetch queue machinery. 

4. The second DSB ISH ensures the invalidation completes. 

5. The final ISB flushes this core’s pipeline, forcing a re-fetch of all program- 
order-later instructions. 


Some hardware implementations provide extra guarantees, rendering the DC or 
IC instructions unnecessary. Arm allow software to discover this in an archi- 
tectural way, by reading the CTR_ELO register’s DIC and IDC bits. Our mod- 
elling handles this, but for brevity we only discuss the weakest case, with 
CTR_ELO.DIC=CTR_ELO.IDC=0, that requires full cache maintenance. 

Arm make clear that instructions can be prefetched (perhaps speculatively): 
“How far ahead of the current point of execution instructions are fetched from 
is IMPLEMENTATION DEFINED. Such prefetching can be either a fixed or a 
dynamically varying number of instructions, and can follow any or all possible 
future execution paths. For all types of memory, the PE might have fetched the 
instructions from memory at any time since the last Context synchronization 
event on that PE.” 
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Concurrent modification and instruction fetch require the same sequence, 
with an ISB on each thread that executes the new instructions, and the rest of 
the sequence on the modifying thread [9, B2.2.5 (B2-94)]. Concurrent modifica- 
tion without synchronisation is restricted to particular instructions (B (branch), 
BL (branch-and-link), BRK (break), SMC, HVC, SVC (secure monitor, hypervisor, 
and supervisor calls), ISB, and NOP), otherwise there could be constrained unpre- 
dictable behaviour: “any behavior that can be achieved by executing any sequence 
of instructions that can be executed from the same Exception level”. Concurrent 
modification of conditional branches is allowed but can result in the old condition 
with the new target address or vice versa. 

All this gives some guidance for programmers, but it leaves the exact seman- 
tics of instruction fetch and those cache maintenance instructions unclear, and in 
practice software typically does not use the above sequence verbatim. For exam- 
ple, it may synchronise a range of addresses at once, looping the DC and IC parts, 
or the final ISB may be subsumed by instruction synchronisation from exception 
entry or return. Linux has many places where it modifies code at runtime: in 
boot-time patching of alternatives, modifying kernel code to specialise it to the 
particular hardware being run on; when the kernel loads code (e.g. when the user 
calls dl_open); and in the ptrace system call, used e.g. by the GDB debugger to 
patch arbitrary instructions with breakpoints at runtime. In Google’s Chrome 
web browser, its WebAssembly and JavaScript just-in-time (JIT) compilers are 
required to both write new code during execution and modify existing code at 
runtime. In JavaScript, this modification happens inside a single thread and so is 
quite straightforward. The WebAssembly case is more complex, as one thread is 
modifying the code of another. A software thread can also be moved (by the OS 
or hypervisor) from one hardware thread to another, perhaps while it is in the 
middle of some instruction cache maintenance. Moreover, for security reasoning, 
we have to be able to bound the possible behaviour of arbitrary code. 

All this means that we cannot treat the above sequence as a whole, as an 
opaque black box. Instead, we need a precise semantics for each individual in- 
struction, but the existing prose documentation does not provide that. 

The problem we face is to give such a semantics, that correctly defines be- 
haviour in arbitrary concurrent contexts, that captures the Arm architectural 
intent, that is strong enough for software, and that abstracts from the variety 
of hardware implementations (e.g. with differing cache structures) that the ar- 
chitecture intends to allow — but which programmers should not have to think 
about. 


3 Instruction Fetch Phenomena and Examples 


We now describe the main instruction-fetch phenomena and architecture design 
questions for ARMv8-A, illustrated by handwritten litmus tests, to guide the 
following model design. 
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3.1 Instruction-Fetch Atomicity 


The first point, as mentioned in §2, is that concurrent modification and fetch 
is only permitted if the original and modified instructions are in a particular 
set: various branches, supervisor /hypervisor/secure-monitor calls, the ISB in- 
struction synchronisation barrier, and NOP. Otherwise, the architecture permits 
constrained unpredictable behaviour, meaning that the resulting machine state 
could be anything that would be reachable by arbitrary instructions at the same 
exception level. The following W-+F test illustrates this. 


W-+F AArch64 
Initial state: 0:WO="SUB X0,X0,#1", 0:X1=1L 

Thread 0 Thread 1 
STR WO,[X1] // modify Thread 1 at l l: ADD X0,X0,#1 // initial code 
Allowed: constrained-unpredictable final state 


In this test Thread 0 performs a memory store (with the STR instruction) 
to the code that Thread 1 is executing; overwriting the ADD X0,X0,#1 instruc- 
tion with the 32-bit encoding of the SUB X0,X0,#1 instruction. If the fetch were 
atomic, the outcome of this test would be the result of executing either the ADD 
or the SUB instruction, but, since at least one of those is not in the set of the 
8 atomically-fetchable instructions given previously, Thread 1 has constrained- 
unpredictable behaviour and the final state is very loosely constrained. Note, 
however, that this is nonetheless much stronger than the C/C++ whole-program 
undefined behaviour in the presence of a data race: unlike C/C++, a hardware 
architecture has to define a useful envelope of behaviour for arbitrary code, to 
provide guarantees for the rest of the system when one user thread has a race. 


Conditional Branches For conditional branches, the Arm architecture pro- 
vides a specific non-single-copy-atomic fetch guarantee: the execution will be 
consistent with either the old or new target, and either the old or new condition. 
For example, this W+F-+branches 


test can overwrite a B.EQ g with W+F+branches Anico 
; Initial state: @:WO="B.NE h", 0:X1=1 
a B.NE h, and end up executing 
Thread 0 Thread 1 


B.NE g or B.EQ h instead of one 
of those. Our future examples will 
only modify NOPs and unconditional 
branch instructions. 


STR WO, [X1] l: B.EQ g 
Allowed: execute "B.NE g" 


3.2 Coherence 


Data writes and reads are coherent, in Arm and in other major architectures: 
in any execution, for each address, the reads of each hardware thread must see 
a subsequence of the total coherence order of all writes to that address. The 
plain-data CoRR test [46] illustrates one case of this: it is forbidden for a thread 
to read a new write of x and then the initial state for x. However, instruction 
fetches are not necessarily coherent: one instruction fetch may be inconsistent 


634 B. Simner et al. 


with a program-order-previous fetch, and the data and instruction streams can 
become out-of-sync with each other. We explore three kinds of coherence: 


— Instruction-to-Instruction Coherence: whether fetches of the same location 
must observe writes to the same location coherently. 

— Data-to-Instruction Coherence: whether fetches and then reads to the same 
location must observe writes to the same location coherently. 

— Instruction-to-Data Coherence: whether reads and then fetches of the same 
location must observe writes to the same location coherently. 


Instruction-to-Instruction Coherence Arm explicitly do not guarantee any 
consistency between fetches of the same location: fetching an instruction does 
not mean that a later fetch of that location will not see an older instruction [9, 


B2.4.4]. This is illustrated by CoFF, like CoRR but with fetches instead of reads. 
CoFF AArch64 
Initial state: 0:WO="B 11", 0:X1=f 
Thread 0 Thread 1 Common Thread 0 " Thread 1 
STR WO,[X1] //a|BL f f: B10 a:write f=B 11 — > b:fetch f=B 11 
MOV X0,X10|l11: MOV X10, #2 f 
BL f RET . pe 
MOV X1,X10|10: noy X10, #1 m irf cfetch f=B 10 
Allowed: 1:X0=2, 1:X1=1 


Here Thread 1 makes two calls to address f (BL is branch-and-link), while 
Thread 0 overwrites the instruction at that address. The interesting potential 
execution is that in which the first call to f fetches and executes the newly- 
written B 11, but the second call fetches and executes the original B 10. We can 
view such executions as graphs, similar to previous axiomatic-model candidate 
executions but with new fetch events, one per instruction, and new edges. As 
usual, we use po and rf edges for the program-order and reads-from relations, 
together with: 


— fe (fetch-to-execute), which relates the fetch event of an instruction to all 
the execution events (memory writes, reads or barriers) of the instruction; 

— irf (instruction-read-from), relating a write to all fetches that read from it 
(analogous to reads-from, rf); and 

— fpo (fetch-program-order), relating fetches of instructions that are in pro- 
gram order (analogous to program order, po). 


Edges from the initial state are drawn from a small circle. Since we do not modify 
the code of most locations, we usually omit the fetch events for those instructions, 
showing only a subgraph of the interesting events, e.g. as on the right above. For 
Arm, this execution is both architecturally allowed and experimentally observed. 

Here, and in future tests, we assume some common code consisting of a 
function at address f which always has the same shape: a branch that might 
be overwritten, which selects a block that writes a value to register X10 before 
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returning. This is sometimes duplicated at different addresses (f1, f2, ...) or 
extended to g, with three cases. We sometimes elide the common code. 


Data-to-Instruction Coherence Fetching from a particular write does imply 
that program-order-later reads from the same address will see that write (or a 
coherence successor thereof). This is a data-to-instruction coherence property, 
illustrated by CoFR below. Here Thread 1 fetches the newly-written B 11 at f 
and then, when reading from f with its LDR load instruction, cannot read the 
original B 10 instruction (it can only read the new B 11). 


CoFR AArch64 
Initial state: 0:WO="B 11", 0:Xl=f, 1:X2=f Thread 0 w Thread 1 
Thread 0 Thread 1 Common a:write f=B 11 — > b:fetch f=B 11 

STR wọ, [X1] |BL f f: B 10 [fpo 

MOV X0,X10 l1: MOV X10, #2 

LDR X1, [X2] RET een c:fetch LDR X1, [X2] 

è ,# 
RET e f |fe 

Forbidden: 1:X0=2, 1:X1="B 10" d:read f=B 10 


This is not clear in the existing prose specification, but the architectural 
intent that emerged during discussion with Arm is that the given execution 
should be forbidden, reflecting microarchitectural choices that (1) instructions 
decode in order, so the fetch b must occur before the read d, and (2) fetches that 
miss in the instruction cache must read from data storage, so the instruction 
cache cannot be ahead of the available data. This ensures that fetching from a 
write means that all threads are now guaranteed to read from that write (or 
another coherence-after it). 


Instruction-to-Data Coherence In the other direction, reading from a par- 
ticular write to some location does not imply that later fetches of that location 
will see that write (or a coherence successor), as in the following CoRF-+ctrLisb. 


CoRF-+ctrl-isb AArch64 
Initial state: 0:WO="B 11", 0:X1=f, 1:X2=f 
Thread 0 Thread 1 Common Thread 0 f Thread 1 

STR W0, [X1] LDR X0, [X2] |f: B10 a:write f=B 11 —™> b:read f=B 11 
CBNZ X0,1l Ll: MOV X10,#2 f 
l: ISB RET ctrl+isb 
BL f 10: MOV X10,#1 ir 2. = 
MOV X1,X10 [RET eif cfetch f=B 10 

Allowed: 1:X0="B 11", 1:X1=1 


Here Thread 1 has a control dependency and an instruction synchronisation 
barrier (the CBNZ conditional branch, dependent on the value read by its LDR 
load, and ISB), abbreviated to ctrl+isb, between its load and the fetch from f. If 
the latter were a data load, this would ensure the two loads are satisfied in order. 
This is not explicit in the existing prose, but it is what one would expect, and it 
is observed in practice. Microarchitecturally, it is easily explained by an out-of- 
date entry for f in the instruction cache of Thread 1: if Thread 1 had previously 
fetched f (perhaps speculatively), and that instruction cache entry has not been 
evicted or explicitly invalidated since, then this fetch of f will simply read the 
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old value from the instruction cache without going out to data memory. The ISB 
ensures that f is freshly fetched, but does not ensure that Thread 1’s instruction 
cache is up-to-date with respect to data memory. 


3.3 Instruction Synchronisation 


Instruction fetches satisfy few guarantees, so explicit synchronisation must be 
performed when modifying the instruction stream. 


Same-Thread Synchronisation Test SM below shows the simplest self- 
modifying code case: without additional synchronisation, a write to program 
memory can be ignored by a program-order-later fetch. 


SM AArch64 
Initial state: 0:W0="B 11", 0:X1=f 
Thread 0 Common Thread 0 

STR WO,[X1] // a |f: B10 a:write f=B 11 
BL f 11: MOV X10,#2 . 
MOV X0,X10 RET ifr 

10: la X10, #1 eit b:fetch f=B 10 
Allowed: 1:X0=1 


In this execution, the fetch b, fetching the instruction at f, fetches a value 
from a write coherence-before a, even though b is the fetch of an instruction 
program-order after a. We illustrate this with an instruction from-reads (ifr) 
edge. This is a derived relation, analogous to the usual from-reads (fr) relation, 
that relates each fetch to all writes that are coherence-after the write it read 
from; it is defined as ifr = irf~+;co. If the fetch were a data read, this would 
be a forbidden coherence shape (COWR). As it is, it is architecturally allowed, 
as described explicitly by Arm [9, B2.4.4], and it is experimentally observed on 
all devices we have tested. Microarchitecturally, this too is simply due to fetches 
from old instruction cache entries. 


Cache Maintenance As we saw in §2, the Arm architecture provides cache 
maintenance instructions to synchronise the instruction and data streams: the 
DC data-cache clean and IC instruction-cache invalidate instructions. To forbid 
the relaxed outcome of SM, by forcing a fetch of the modified code, the specified 
sequence of cache maintenance instructions must be inserted, with an ISB. 


SM-+cachesync-isb AArch64 
Initial state: 0:WO="B 11", 0:X1=f 

Thread 0 Thread 0 
STR WO,[X1] //overwrite f with branch a:write f=B 11 
DC CVAU,X1 //clean data cache 
DSB ISH cachesync 
Ic IVAU,X1 //invalidate instruction cache b:ISB 
DSB ISH ý 
ISB //flush pipeline |jsb 
BL f H 
MOV X0,X10 olf cfetch FEB 10 
Forbidden: 1:X0=1 
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Now the outcome is forbidden. The cache synchronisation sequence DC CVAU; 
DSB ISH; IC IVAU; DSB ISH (which we abbreviate to a single cachesync edge) 
ensures that by the time the ISB executes, the instruction and data memory have 
been made coherent with each other for f. The ISB then ensures the final fetch 
of f is ordered after this sequence. The microarchitectural intuition for this was 
in §2; our §4 operational model will describe the semantics of each instruction. 


Cross-Thread Synchronisation We now consider modifying code that can be 
fetched by other threads, using variants of the standard message-passing shape 
MP. That checks whether two writes (to different locations) on one thread can 
be seen out-of-order by two reads on another thread; here we replace one or both 
of those reads by fetches, and ask what synchronisation is required to ensure that 
the relaxed outcome is forbidden. Consider first an MP variant where the first 
write is of a new instruction, and the second is just a simple data memory flag: 


MP.RF+dmb-ctrl-isb AArch64 
Initial state: 0:WO="B 11", 0:X1=f, 
@:X2=1, 0:X3=x, 1:X2=x, [x]=0 Thread 0 Thread 1 
Thread 0 Thread 1 a:write f=B 11 c:read x=1 
STR WO, [X1] LDR XO, [X2] jim t7 [ern 
DMB ISH CBNZ X0,1L i 
STR X2, [X3] l: ISB b:write x=1 d:ISB 
BL f [jsb 
MOV X1,X10 i 
Allowed: 1:X0=1, 1:X1=1 eif efetch f=B 10 


This test includes sufficient synchronisation on each thread to enforce thread- 
local ordering of data accesses: the DMB in Thread 0 ensures the writes a and b 
propagate to memory in program order, and the control-dependency into an ISB 
on Thread 1 ensures the read c and the fetch e happen in program order. How- 
ever, as we saw in §2, this is not enough to synchronise concurrent modification 
and execution of code in ARMv8-A. Thread 0 needs the entire cache synchro- 
nization sequence (giving test MP.RF+cachesync+ctrl-isb, not shown), not just 
a DMB, to forbid this outcome. 


Another variant of this MP-shape test where the message passing itself is 
done using modification of code gives a much stronger guarantee, as can be 
seen from the following MP.FR+dmb-+fpo-fe test. This is not clear from the 


MP.FR+dmb+fpo-fe AArch64 

Initial state: 0:X0=1, 0:X1=x, 

1:X2=x, [x]=0, Thread 0 Thread 1 
O:W2="B 11", 0:X3=f a:write x=1 c:fetch f=B 11 


Thread 0 Thread 1 lamb we [foo 


STR XO, [X1] BL f ; 
DMB ISH MOV X0, X10 b:write f=B 11 d:fetch LDR X1, [X2] 


STR W2, [X3] LDR X1, [X2] |fe 
Forbidden: 1:X0=2, 1:X1=0 


e:read x=0 


architecture manual, but this outcome is already forbidden with only the DMB. 
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This is for similar reasons to the above CoFR test: since Thread 1 fetched the 
updated value for f, we know that value must have reached at least the data 
caches (since that is where the instruction cache reads from) and therefore multi- 
copy atomicity guarantees that a normal load instruction will observe it. 


The final variant of these MP-shaped tests has both Thread 0 writes be of new 
instructions. This idiom is very common in practice; it is currently how Chrome’s 
WebAssembly JIT synchronises the modified thread with the new code. 


MP.FF+dmb-+fpo AArch64 

Initial state: 0:W0="B 11", 0:X1=f1, 

OnW2="B E OEA Thread 0 Thread 1 
Thread 0 Thread 1 a:write f1=B 11 c:fetch f2=B 11 

STR W0, [X1] [BL f2 [amb w [fpo 

DMB ISH MOV X0,X10 . i 

STR W2, [X3] A f1 b:write f2=B 11 ott d:fetch f1=B 10 

MOV X1,X10 
Allowed: 1:X0=2, 1:X1=1 


Without the full cachesyne sequence on Thread 0, this is an allowed 
outcome. Interestingly, adding the cachesync sequence to Thread 0 (Test 
MP.FF-+cachesync+fpo, not shown) is sufficient to make the outcome forbid- 
den, without an ISB in Thread 1, as the cachesync sequence is intended to make 
it appear that fetches occur in program order. Microarchitecturally, that could 
be ensured in two ways: either by actually fetching in-order, or by making the 
IC instruction not only invalidate all the instruction caches (for this address) 
but also clean any core’s pre-fetch buffer stale entries (for this address). Archi- 
tecturally, this is not clear in the current prose, but, concurrent with this work, 
Arm were independently strengthening their definition to make it so. 


Incremental Synchronisation The cache synchronisation sequence need not 
be contiguous, or even all in the same thread. So long as the sequence in its 
entirety has been performed by the time the fetch happens, then the instruction 
stream will have been made consistent with the data stream for that address. 

This is demonstrated by the following test, where Thread 0 performs a write 
to f and then only a DC before synchronizing with Thread 1, which performs the 
IC, while Thread 2 observes the modified code. This can happen in practice when 
a software thread is migrated between hardware threads at runtime, by a hyper- 
visor or OS. Thread 0 and Thread 1 may just represent the runtime scheduling 
of a single process, beginning execution on hardware Thread 0 but migrated to 
hardware Thread 1 between the DC and IC instructions. In the graph, the desync 
and icsync represent the DC;DSB ISH and DSB ISH;IC;DSB ISH combinations. The 
DC does not need a preceding DSB ISH because it is ordered w.r.t. the preceding 
store to the same cache line. 

Here the IC gets broadcast to all threads [9, B2.2.5p3], and so the fact that 
it happens on a different thread to the DC does not affect the outcome. Similarly, 
if the DC were to happen on another thread first (to get the test MP.RF-+[dc]- 
ic+ctrl-isb, not shown), then it would have the effect of ensuring consistency 
globally, for all threads. 
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ISA2.F+de+ic+ctrl-isb AArch64 
Initial state: 0:W0="B 11", 0:X1=f, 
@:X2=1, 0:X3=x, [x]=0, 1:X4=f, Thread 0 Thread 1 Thread 2 


1:X1=x, 1:X2=1, 1:X3=y, [y]=0, 2:X2=y a:write f=B 11 c:read x=1 e:read y=1 


Thread 0 Thread 1 Thread 2 |desyne 7 |icsyne 7 {cert 


STR. W95 [X1]. (LDR XO, [X1]. EO eel :write x=1 d:write y=1 f: ISB 


DC CVAU, X1 |DSB ISH CBZ XO, 1l 
DSB ISH IC IVAU, X4 |l: ISB {jsp 
STR X2,[X3] |DSB ISH BL f ifr 


STR X2,[X3] |MOV X1,X10 
Forbidden: 1:X0=1, 1:X1=1 


g:fetch f=B 10 


3.4 Multi-Copy Atomicity 


For data accesses, the question of whether they are multi-copy atomic is a crucial 
one for relaxed architectures. IBM POWER, ARMv7, and pre-2018 ARMv8-A 
are/were non-multi-copy atomic: two writes to different addresses could become 
visible to distinct other threads in different orders. Post-2018 ARMv8-A and 
RISC-V are multi-copy atomic (or “other multi-copy-atomic” in Arm terminol- 
ogy) [37,36,9]: the programmer can assume there is a single shared memory, with 
all relaxed-memory effects due to thread-local out-of-order execution. 

However, for fetches, due to the lack of any fetch atomicity guarantee for most 
instructions (§3.1), and the lack of coherent fetches for the others (§3.2), the 
question of multi-copy atomicity is not particularly interesting. Tests are either 
trivially forbidden (by data-to-instruction coherence) or are allowed but only the 
full cache synchronisation sequence provides enough guarantees to forbid it, and 
(§3.3) this ensures all cores will share the same consistent view of memory. 


3.5 Strength of the Ic Instruction 


Multiple Points of Unification Cleaning the data cache, using the DC in- 
struction, makes a write visible to instruction memory. It does this by pushing 
the write past the Point of Unification. However, there may be multiple Points 
of Unification: one for each core, where its own instruction and data memory 
become unified, and one for the entire system (or shareability domain) where all 
the caches unify. Fetching from a write implies that it has reached the closest 
PoU, but does not imply it has reached any others, even if the write originated 
from a distant core. Consider: Here Thread 0 modifies f, Thread 1 fetches the 
new value and performs just an IC and DSB, before signalling Thread 0 which 
also fetches f. That IC is not strong enough to ensure that the write is pulled 
into the instruction cache of Thread 0. 

This is not clear in the existing prose, but the architectural intent is that it 
be allowed (i.e., that IC is weak in this respect). We have not so far observed it 
in practice. The write may have passed the Point of Unification for Thread 1, 
but not the shared Point of Unification for both threads. In other words, the 
write might reach Thread 1’s instruction cache without being pushed down from 
Thread 0’s data cache. Microarchitecturally this can be explained by direct data 
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SM.F+ic AArch64 
Initial state: 0:WO="B 11", 0:X4=f, Thread 0 wf Thread 1 
@:X3=x, [x]=0, 1:X4=f, 1:X2=1, 1:X3=x a:write f=B 11 ———> e:fetch f=B 11 
Thread 0 Thread 1 |p Jiesyne 
STR W0, [X4] |BL f b:read x=1 <7 fwrite x=1 
LDR X2, [X3] MOV X0,X10 r 
CBZ X2; l IC IVAU, X4 [etr 
l: ISB DSB ISH ine 
BL f STR X2, [X3] c: ISB 
MOV X1,X10 [jso 
Allowed: 1:X0=2, 0:X2=1, 0:X1=1 irf d:fetch f=p 10 


intervention (DDI), an optimisation allowing cache lines to be migrated directly 
from one thread’s (data) cache to another. The line could be migrated from 
Thread 0 to Thread 1, then pushed past Thread 1’s Point of Unification, making 
it visible to Thread 1’s instruction memory without ever making it visible to 
Thread 0’s own instruction memory. The lack of coherence between instruction 
and data caches would make this observable, even in multi-copy atomic machines. 


Stale Fetches So far, we have only talked about fetching from two distinct 
writes. But theoretically there is no limit to how far back we can fetch from, 
with insufficient synchronization. The MP.RF+dmb-+ctrlisb test (§3.3) required 
the full cachesync sequence to forbid the given behaviour. Below we give a test, 
FOW, similar to that MP-shaped test but allowing many consumer threads 
to independently and simultaneously see different values in their instruction 
memory, even after invalidating their caches. 

FOW AArch64 
Initial state: 0:WO="B 11", 0:X2=g, 0:W1="B 12", 0:X3=1, 0:X4=x, [x]=0, 
isle, 2E USR 


Thread 0 Thread 1 Thread 2 Common 

STR WỌ, [X2] LDR X0, [X4] LDR X0, [X4] g: B 10 

STR W1, [X2] CBNZ X0, la CBNZ X0, lb 12: MOV X10, #3 

DSB ISH la: ISB lb: ISB RET 

IC IVAU, X2 BL g BL g 11: MOV X10, #2 

DSB ISH MOV X1,X10 MOV X1,X10 RET 

STR X3, [X4] 10: MOV X10, #1 
RET 


Allowed: 1:X0=1, 1:X1=2, 2:X0=1, 2:X1=1 


Thread 0 Thread 1 Thread 2 
a:write g=B 11 d:read x=1 f:read x=1 


Įpo irf |ctrl+isb |ctrl+isb 
b:write g=B 12 /,¢ e:fetch g=B 11 otf g:fetch g=B 10 

{icsync 
c:write x=1 


This is not clear in the existing architecture text. It is a case where the architec- 
ture design is not very constrained. On the one hand, it has not been observed, 
and it is thought unlikely that hardware will ever exhibit this behaviour: it would 
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require keeping multiple writes in the coherent part of the data caches, rather 
than a single dirty line, which would require more complex cache coherence pro- 
tocols. On the other hand, there does not seem to be any benefit to software from 
forbidding it. Arm therefore prefer the choice that gives a simpler and weaker 
model (here the two happen to coincide), to make it easier to understand and to 
provide more flexibility for future microarchitectural optimisations. We therefore 
design our models to allow the above behaviour. 


3.6 Strength of the DC Instruction 


Instruction Cache depth Test CoFF (§3.2) showed that fetches can see “old” 
writes. In principle, there is no limit to the depth of the instruction-cache hier- 
archy: there could be many values for a single location cached in the instruction 
memory for each core, even if the data cache has been cleaned. The test below 
illustrates this, with Thread 1 able to see all three values for g. 
MP.RF+dc+ctrl-isb-isb AArch64 


Initial state: 0:WO="B 11", 0:X2=g, 
O:W1="B 12", 0:X3=1, 0:X4=x, [x]=0, 1:X4=x 
Thread 0 Thread 1 
Thread 0 Thread 1 Common F l d d 1 
STR W0, [X2] [LDR XO, [X4] |g: B10 ate BTB: We rean a= 
STR W1, [X2] |CBNZ XO, 1l 12:MOV X10,#3 po f [etri+isb 
DSB ISH 1: ISB RET riy. 
DC CVAU,X2 BL g l1:MOV X10, #2 b:write g=B 12 irf e:fetch g=B 12 
DSB ISH MOV X1,X10 RET 
STR X3,[X4] |ISB 10:MOV X10,#1 ‘i facsyne |jsb 
BL g RET 
MOV X2,X10 c:write x=1 f:fetch g=B 11 
ISB 
MOV X3 X10 fis 
, tf 5. = 
Allowed: 1:X0=1, 1:X1=3, 1:X2=2, 1:X3=1 g:fetch g=B 10 


This is similar to the preceding FOW case: it is thought unlikely that hardware 
will exhibit this in practice, but the desire for the simpler and weaker option 
means the architectural intent is to allow it, and we follow that in our models. 


4 An Operational Semantics for Instruction Fetch 


Previous work on operational models for IBM POWER and Arm “user- 
mode” concurrency [46,45,22,18,19,37] has shown, surprisingly, that as far as 
programmer-visible behaviour is concerned, one can abstract from almost all 
hardware implementation details of data memory (store queues, the cache hi- 
erarchy, the cache protocol, etc.). For ARMv8-A, following their 2018 shift to 
a multicopy-atomic architecture, one can do so completely: the Flat model of 
[37] has a shared flat memory, with a per-thread out-of-order thread subsystem, 
modelling pipeline effects, responsible for all observable relaxed behaviour. For 
instruction-fetch, it is no longer possible to abstract completely from the data 
and instruction cache hierarchy, but we can still abstract from much of it. 


The Flat Model is a small-step operational semantics for multi-copy atomic 
ARMv8-A, including the relaxed behaviours of loads and stores [37]. Its states are 
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abstract machine states consisting of a tree of instructions for each thread, and 
a flat memory subsystem shared by all threads. Each instruction in each thread 
corresponds to a sequence of transitions, with some guards and a potential effect 
on the shared memory state. The Flat model is made executable in our RMEM 
tool, which can exhaustively interleave transitions to enumerate all the possible 
behaviours. The tree of instructions for each thread models out-of-order and 
speculative execution explicitly. Below we show an example for a thread that is 
executing 10 instruction instances. 


Some (grey) are finished, no longer par 
subject to restart; others (pink) S -E — a ~ 
have run some but perhaps not all “ag 


of their instruction semantics; in- 
structions are not necessarily atomic. Those with multiple children are branch 
instructions with multiple potential successors speculated simultaneously. 

For each state, the model defines the set of allowed transitions, each of which 
steps to a new machine state. Transitions correspond to steps of single instruc- 
tions, and individual instructions may give rise to many. Example transitions 
include Register Write, Propagate Write to Memory, etc. 


iFlat Extension Originally, Flat decode 
had a fixed instruction mem- 


E 
ory, with a single transition that 3 Fetch Queue || 
D new 

can speculate the address of any E fetch 

3 £ || Thread peguest [fetch 
program-order successor of any in- T || eee (EO 
struction in flight, fetch it from 2 Abstract I$ 
the fixed instruction memory, and =: is 
decode it. We now remove that P write data 
fixed instruction memory, so that 5 v = 
é ; D 2 
instructions can be fetched from 3 oe = 
data writes, and add the additional p| L most | D$ Jany ® 
structures as shown on the right. © recent 
These are all of unbounded size, as = ” 
is appropriate for an architecture Memory 


definition. 


Fetch Queues (per-thread) These are ordered buffers of pre-fetched entries, 
waiting to be decoded and begin execution. Entries are either a fetched 32-bit 
opcode, or an unfetched request. The fetch queues allow the model to speculate 
and pre-fetch many instructions ahead of where the thread is currently executing. 
The model’s fetch queues abstract from multiple real-hardware structures: in- 
struction queues, line-fill buffers, loop buffers, and slots objects. We keep a close 
relation to this underlying microarchitecture by allowing out-of-order fetches, 
but we believe this is not experimentally observable on real hardware. 


Abstract Instruction Cches (per-thread) These are just sets of writes. 
When the fetch queue requests a new entry, it gets satisfied from the instruction 
cache, either immediately (a hit) or at some later point in time (a miss). The 
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instruction cache can contain many possible writes for each location (§3.6), and 
it can be spontaneously updated with new writes in the system at any time ([9, 
B2.4.4]). To manage IC instructions, each thread keeps a list of addresses yet to 
be invalidated by in-flight ICs. 


Data Cache (global) Above the single shared flat memory for the entire sys- 
tem, which sufficed for the multi-copy-atomic ARMv8-A data memory, we insert 
a shared buffer which is just a list of writes; abstracting from the many possible 
coherent data cache hierarchies. Data reads must be coherent, reading from the 
most recent write to the same address in the buffer, but instruction fetches are 
allowed to read from any such write in the buffer (§3.2). 


Transitions To accommodate instruction fetch and cache maintenance, we in- 
troduce new transitions: Fetch Request, Fetch Instruction, Fetch Instruction 
(Unpredictable), Fetch Instruction (B.cond), Decode Instruction, Begin IC, 
Propagate IC to Thread, Complete IC, Perform DC, and Update Instruction 
Cache. We also have to modify some Flat transitions: Commit ISB, Wait for 
DSB, Commit DSB, Propagate Memory Write, and Satisfy Read from Memory. 
These transitions define the lifecycle of each instruction: a request gets issued 
for the fetch, then at some later point the fetch gets satisfied from the instruc- 
tion cache, the instruction is then decoded (in program-order) and then handed 
to the existing semantics to be executed. To give a flavour, we show just one, 
the Propagate IC to Thread transition, which is responsible for invalidation of 
the abstract instruction caches. This is a prose rendering of the rule in our exe- 
cutable mathematical model, which is expressed in the typed functional subset 
of Lem [32]. 


Propagate IC to Thread An instruction i (with ID iid) in state 
WAIT_IC (address, state_ cont) can do the relevant invalidate for any thread 
tid’, modifying that thread’s instruction cache and fetch queue, if there exists 
a pending entry (iiid, address) in that thread’s ic_ writes. Action: 


1. for any entry in the fetch queue for thread tid, whose program_loc is 
in the same minimum-size instruction cache line as address, and is in 
FETCHED(_) state, set it to the UNFETCHED state; 

2. for the instruction cache of thread tid, remove any write-slices which are 
in the same instruction cache line of minimum size as address. 

This rule can be found under the same name in the full prose description, 
and in the handle_ic_ivau and flat_propagate_cache_maintenance functions 
in machineDefThreadSubsystem. lem and machineDefFlatStorageSubsystem. lem 
in the executable mathematics. Cache maintenance operations work over entire 
cache lines, not individual addresses. Each address is associated with at least one 
cache line for the data (and unified) caches, and one for the instruction caches. 
The cache line of minimum size is the (architected) smallest possible cache line 
for each of these. 


Example This model correctly explains all the behaviours of §3. We illustrate 
this by revisiting the cache synchronization explanation of §2, which can now 
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be re-interpreted w.r.t. our precise model, and using this to explain the thread 
migration case of §3.3. Given DC Xn; DSB; IC Xn; DSB we can use this model 
to give meaning to it (omitting uninteresting transitions): First the DC CVAU 
causes a Perform DC transition. This pushes any write that might have been 
in the abstract data cache into memory. Now the first DSB’s Commit DSB can 
be taken, allowing Begin IC to happen. This creates entries for each thread, 
which are discharged by each Propagate IC to Thread (see above). Once all 
entries are invalidated, a Complete IC can happen. Now, if any thread decodes 
an instruction for that address, it must have been fetched from the write the 
DC pushed, or something coherence-after it. If the software thread performing 
this sequence is interrupted and migrated (by the OS) to a different hardware 
thread, then, so long as the OS includes the DSB to maintain the thread-local DC 
ordering, the DC will push the write in an identical way, since it only affects the 
global abstract data cache. The IC transitions can all be taken, and the sequence 
continues as before, just on a new hardware thread. So when the second DSB 
finishes, and the final Commit DSB transitions is taken, the effect of the full 
sequence will be seen system-wide even if the thread was migrated. 


5 An Axiomatic Semantics for Instruction Fetch 


Based on the operational model, we develop an axiomatic semantics, as an ex- 
tension of the ARMv8 axiomatic reference model [15,37]. Since that does not 
have mixed-size support, we do not model the concurrent modification of condi- 
tional branches (§3.1), as this would require mixed-size machinery. The existing 
axiomatic model is a predicate on candidate executions, hypothetical complete 
executions of the given program that satisfy some basic well-formedness condi- 
tions, defining the set of valid executions to be those satisfying its axioms. Each 
candidate execution abstractly captures a particular concrete execution of the 
program in terms of events and relations over them. This model is expressed in 
the herd language [8,6,4]. The events of these executions are memory reads (the 
set R), memory writes (W), and memory barrier/fence events (F). The relations 
are: program order (po), capturing the sequencing of events by the same thread in 
the execution’s control-flow unfolding; reads-from (rf), relating a write event w 
with any read event r that reads from it; the coherence order (co), recording the 
execution’s sequencing of same-address writes in memory; and read-modify-write 
(rmw), capturing which load/store exclusive instructions form a successful exclu- 
sive pair in the execution. The derived relation from-reads fr = rf~+;co relates 
aread r with a write w’ if r reads from a write w coherence before w’. In addition, 
candidate executions also have relations capturing dependencies between events: 
address (addr), data (data), and control dependencies (ctrl). The relation loc 
relates any two read/write events that are to the same memory address. The 
model also has relations suffixed “i” and “e”: rfi/rfe, coi/coe, fri/fre. These 
are the restrictions of the relations rf, co, and fr, to same-thread /“internal” 
event pairs or different-thread/“external” event pairs. The model is defined in 
relational algebra. In herd, R;S stands for sequential composition of relations R 
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and S, R~? for the inverse of relation R, R|S and R&S for the union and intersection 
of R and S, and [A];R;[B] for the restriction of R to the domain A and range B. 


Handling instruction fetch requires extending the notion of candidate ex- 
ecution. We add new events: an instruction-fetch (IF) event for each executed 
instruction; a DC event for each DC CVAU instruction; an IC event for each IC IVAU 
and IC IALLU instruction. We replace po with fetch-program-order (fpo) which 
orders the IF event of an instruction before any program-order later IF events. 
We add a relation same-cache-line (scl), relating reads, writes, fetches, DC and 
IC events to addresses in the same cache line. We add an acyclic transitively 
closed relation wco, which extends co with orderings for cache maintenance (DC 
or IC) events: it includes an ordering (e, e’) or (e’,e) for any cache maintenance 
event e and same-cache-line event e’ if e’ is a write or another cache mainte- 
nance event; where co = ([W];wco;[W]) & loc. The loc, addr, and ctrl are all 
extended to include DC and IC events. We add a fetch-to-execute relation (fe), 
relating an IF event to any event generated by the execution of that instruction; 
and an instruction-read-from relation (irf), which relates a write to any IF event 
that fetches from it. Finally, we add a boolean constrained-unpredictable (CU) to 
detect badly behaved programs. Now we derive the following relations: the stan- 
dard po relation, as po = fe +; fpo; fe (two events e and e’ are po-related if their 
fetch-events are fpo-related); and instruction-from-reads (ifr), the analogue of 
fr for instruction fetches, relating a fetch to all writes coherence-after the one it 
fetched from: ifr = irf~?;co. 


We then make two semantics-preserving rewrites of the existing model to 
make adding instruction fetches easier (described in the appendix); and make 
the following changes and additions to the model. The full model is shown in 
Figure 1, with comments pointing to the relevant locations in the model defini- 
tion. For lack of space we only describe the main addition, the iseq relation, in 
detail (including its correspondence with the operational model of §4); for the 
others we give an overview and refer to the appendix for the full description. 


We define the relation iseq, relating some write w to address x to an IC 
event completing a cache synchronisation sequence (not necessarily on a single 
thread): w is followed by a same-cache line DC event, which is in turn followed 
by a same-cache line IC event. In operational model terms, this captures traces 
that propagated w to memory, subsequently performed a same-cache-line DC, 
and then began an IC (and eagerly propagated the IC to all threads). In any 
state after this sequence it is guaranteed that w, or a coherence-newer same- 
address write, is in the instruction cache of all threads: performing the DC has 
cleared the abstract data cache of writes to x, and the subsequent IC has re- 
moved old instructions for location x from the instruction caches, so that any 
subsequent updates to the instruction caches have been with w, or co-newer 
writes. Adding ifr;iseq to the observed-by relation (obs) (4) relates an instruc- 
tion fetch 7 to location x to an IC ic if: i fetched from a write w to x, some 
write w’ to x is coherence-after w, and ic completes a cache synchronisation se- 
quence (iseq) starting from w’. Then the irreflexive ob axiom requires that i 
must be ordered-before ic (because it would otherwise have fetched w’).We now 
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let iseq = [W];(wco&scl);[DC]; (*1*) [dmb. ld]; po; [R|W] 
(wco&scl); [IC] [A]Q]; po; [R|W] 
(* Observed-by *) [w]; po; [dmb.st] 
let obs = rfe | fr | wco (*2x) a a 
| irt | Ree Sem) ye [R|W|F|DC|IC]; po; [dsb.ish] (+9 
(* Fetch-ordered-before *) [dsb.ish]; po; [R|W|F|DC|IC] (+10 
let fob = [IF]; fpo; [IF] (*5*) [dmb.sy]; po; [DC] (+11) 
| [IF]; fe i (+0) (* Cache-op-ordered-before *) 
| [ISB]; fe >; fpo (*7*) let cob = [R|W]; (po&scl); [DC] (*12*) 
(* Dependency-ordered-before *) PGT (posset); [DC] oe) 
let dob = addr | data (x Ordered-before *) 
| ctrl; [w] let ob = (obs|fob|dob|aob|bob|cob)+ 
| (ctrl | (addr; po)); [ISB] mare 
(*| [ISB]; po; [R] *) (*8*) (* Internal visibility requirement *) 
| addr; po; [W] acyclic (po-loc|fr|co|rf) as internal 
| (addr | data); rfi ba Boe ess iui Eu HART Gamat 
(* External visibility requirement 
(x Atomic-ordered-before *) irreflexive ob as external 


let aob = rmw 
| [range(rmw)]; rfi; [A|Q] 


(x Atomic *) 
(x Atomic *) 
empty rmw & (fre; coe) as atomic 


(x Barrier-ordered-before x) 


(* Constrained unpredictable *) 
let b = [R|W]; + [dmb. \ p ) 
| IL]; po; [A] ob~* \ (co;iseq;ob) 
| [R]; po; [dmb.ld] cff_bad cff = CU (*15*) 


Fig. 1. Axiomatic model 


briefly overview other changes made to the axiomatic model and their intuition. 
We include irf in obs (3): for an instruction to be fetched from a write, the 
write has to have been done before. We add a relation fetch-ordered-before (fob) 
(5-7), which is included in ordered-before. The relation fob includes fpo and fe; 
including fpo (5) requires fetches to be ordered according to their position in the 
control-flow unfolding of the execution. and including the fe (fetch-to-execute) 
relation (6) captures the idea that an instruction must be fetched before it can 
execute; fetches program-order-after an ISB happen after the ISB (or else are 
restarted) (7). For DSB ISH instructions the edge [R|W|F|DC|IC];po;[dsb.ish] 
is included in ob (9): DSB ISHs are ordered with all program-order-preceding 
non-fetch events. Symmetrically, all non-IF events are ordered after program- 
order-preceding dsb.ish events (10). DCs wait for preceding dmb.sy events (11). 
We include the relation cache-op-ordered-before (cob) in ob. This relation orders 
DC instructions with program-order previous reads/writes and other DCs to the 
same cache line (12,13). 


Finally, could-fetch-from (cff) (14) captures, for each fetch i, the writes it 
could have fetched from (including the one it did fetch from), which we use to 


define the constrained unpredictable axiom cff_bad (not given) (15). 
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6 Validation 


To gain confidence in the presented models we validated the models against the 
Arm architectural intent, against each other, and against real hardware. 


Validation against the Architecture To ensure our models correctly cap- 
tured the architectural intent we engaged in detailed discussions with Arm, in- 
cluding the Arm chief architect. These involved inventing litmus tests (including, 
those described in §3 and many others) and discussing what the architecture 
should allow in each case. 


Validating against hardware To run instruction-fetch tests on hardware, we 
extended the litmus tool [7]. The most significant extension consists in handling 
code that can be modified, and thus has to be restored between experiments. To 
that end, code copies are executed, those copies reside in mmap’d memory with 
(execute permission granted. Copies are made from “master” copies, in effect 
C functions whose contents basically consist of gcc extended inline assembly. Of 
course, such code has to be position independent, and explicit code addresses in 
test initialisation sections (such as in 0:X1=1 in the test of §3.1) are specific to 
each copy. All the cache handling instructions used in our experiments are all 
allowed to execute at exception level 0 (user-mode), and therefore no additional 
privilege is needed to run the tests. 

To automatically generate families of interesting instruction-fetch tests, we 
extended the diy test generation tool [3] to support instruction-fetch reads- 
from (irf) and instruction-fetch from-reads (ifr) edges, in both internal (same- 
thread) and external (inter-thread) forms, and the cachesync edge. We used this 
to generate 1456 tests involving those edges together with po, rf, fr, addr, ctrl, 
ctrlisb, and dmb.sy. diy does not currently support bare DC or IC instructions, 
locations which are both fetched and read from, or repeated fetches from the 
same location. 

We then ran the diy-generated test suite on a range of hardware implemen- 
tations, to collect a substantial sample of actual hardware behaviour. 


Correspondence between the models We experimentally test the equiva- 
lence of the operational and axiomatic models on the above hand-written and 
diy-generated tests, checking that the models give the same sets of allowed final 
states, and that these are consistent with the hardware observations. 


Making the models executable as a test oracle To make the operational 
model executable as a test oracle, capable of computing the set of all allowed 
executions of a litmus test, we must be able to exhaustively enumerate all possible 
traces. For the model as presented, doing this naively is infeasible: for each 
instruction it is theoretically possible to speculate any of the 264 addresses as 
potential next address, and the interleaving of the new fetch transitions with 
others leads to an additional combinatorial explosion. 

We address these with two new optimisations. First, we extend the fixed-point 
optimisation in RMEM (incrementally computing the set of possible branch tar- 
gets) [37] to keep track not only of indirect branches but also the successors of 
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every program location, and only allow speculating from this set of successors. 
Additionally, we track during a test which locations were both fetched and mod- 
ified during the test, and eagerly take fetch and decode transitions for all other 
locations. As before, the search then runs until the set of branch targets and 
the set of modified program-locations reaches a fixed point. We also take some 
of the transitions eagerly to reduce the search space, in cases where this cannot 
remove behaviour: Wait for IC, Complete IC, Fetch Request, and Update 
Instruction Cache. 


Making the axiomatic model executable as a test oracle The axiomatic 
model is expressed in a herd-like form, but the herd tool does not support instruc- 
tion fetch and cache maintenance instructions. To make the model executable 
as a test oracle, we built a new tool that takes litmus tests and uses a Sail [11] 
definition of a fragment of the ARMv8-A ISA to generate SMT problems for the 
model. Using the Sail instruction semantics, we generate a Sail program that cor- 
responds to each thread within a litmus test. The tool then partially evaluates 
these programs using the concrete values for addresses and registers specified in 
the litmus file, while allowing memory values and arbitrary addresses to remain 
symbolic. Using a Sail to SMT-LIB backend, these are translated into SMT defi- 
nitions that include all possible behaviours of each thread as satisfiable solutions. 
The rules for the axiomatic model are then applied as assertions restricting the 
possible behaviours to just those allowed by the axiomatic model. The tool also 
derives the addr and data relations, using the syntactic dependencies within the 
instruction semantics to derive the syntactic dependencies between instructions. 


For litmus tests, where we can know up-front which instructions may be 
modified, we would like to avoid generating IF events for instructions that cannot 
be modified. If we naively removed certain IF events, however, we would break 
the correspondence between po and fe~!;fpo;fe. This can be worked around 
by ensuring that every modifiable instruction generates an event which appears 
in po, allowing fpo between the modifiable instructions to instead be derived 
as fe;po;fe ?. Branches emit a special branch address announce event for this 
purpose, which is also used to derive the ctrl relation. The fpo relation can 
then be modified, replacing [ISB];fe~';fpo with [ISB];po;fe + and adding 
[ISB];po. The second change ensures that all the transitive edges generated by 
[ISB];fe~+;fpo followed by [IF]; fe remain with fob and hence ob. 


A limitation of this approach is it cannot support cases where two threads 
both attempt to execute the same possibly-modified instruction, as in the 
SM.F-+ic and FOW tests. 


Validation results First, to check for regressions, we ran the operational model 
on all the 8950 non-mixed-size tests used for developing the original Flat model 
(without instruction fetch or cache maintenance). The results are identical, ex- 
cept for 23 tests which did not terminate within two hours. We used a 160 
hardware-thread POWERS server to run the tests. 

We have also run the axiomatic model on the 90 basic two-thread tests that 
do not use Arm release/acquire instructions (not supported by the ISA semantics 
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used for this); the results are all as they should be. This takes around 30 minutes 
on 8 cores of a Xeon Gold 6140. 

Then, for the key handwritten tests mentioned in this paper, together with 
some others (that have also been discussed with Arm), we ran them on various 
hardware implementations and in the operational and axiomatic models. The 
models’ results are identical to the Arm architectural intent in all cases, except 
for two tests which are not currently supported by the axiomatic checker. 


Test Arm intent op. model ax. model hardware obs. 
CoFF allow = = 42.6k/13G 
CoFR forbid = = 0/13G 
CoRF-+ctrl-isb allow = = 3.02G/13G 
SM allow = = 25.8G/25.9G 
SM-+ cachesync-isb forbid z = 0/25.9G 
MP.RF+dmb+ctrl-isb allow = = 480M /6.36G 
MP.RF-+cachesync-+ctrl-isb forbid = = 0/13G 
MP.FR+dmb+fpo-fe forbid z = 0/13G 
MP.FF+dmb+fpo allow = = 447M /13G 
MP.FF-+cachesync+fpo forbid = = ¥2.3k/13G 
ISA2.F+dc-+ic+ctrl-isb forbid =- = 0/6.98G 
SM.F+ic allow = unsupported ©0/12.9G 
FOW allow = unsupported “7G 
MP.RF+dc+ctrl-isb-isb allow = = ©0/12.94G 
MP.R.RF-+-addr-cachesync+-dmb-+ctrl-isb forbid = = 0/6.97G 
MP.RF-+dmb-+addr-cachesyne allow = = °0/6.34G 


[The hardware observations are the sum of testing seven devices: a Snapdragon 810 
(4x Arm A53 + 4x Arm A57 cores), Tegra K1 (2x NVIDIA Denver cores), Snapdragon 
820 (4x Qualcomm Kryo cores), Exynos 8895 (4x Arm A53 + 4x Samsung Mongoose 2 
cores), Snapdragon 425 (4x Arm A53), Amlogic 905 (4x Arm A53 cores), and Amlogic 
922X (4x Arm A73 + 2x Arm A53 cores). U: allowed but unobserved. F: forbidden but 
observed. | 

Our testing revealed a hardware bug in a Snapdragon 820 (4 Qualcomm Kryo 
cores). A version of the first cross-thread synchronisation test of §3.3 but with 
the full cache synchronisation (MP.RF+cachesync-+ctrl-isb) exhibited an illegal 
outcome in 84/1.1G runs (not shown in the table), which we have reported. We 
have also seen an anomaly for MP.FF+cachesync-+fpo, currently under investi- 
gation by Arm. Apart from these, the hardware observations are all allowed by 
the models. As usual, specific hardware implementations are sometimes stronger. 

Finally, we ran the 1456 new instruction-fetch diy tests on a variety of hard- 
ware, for around 10M iterations each, and in the operational model. The model 
is sound with respect to the observed hardware behaviour except for that same 
Snapdragon 820 device. 


7 Related Work 


To the best of our knowledge, no previous work establishes well-validated rigor- 
ous semantics for any systems aspects, of any current production architecture, 
in a realistic concurrent setting. 
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The closest is Raad et al.’s work on non-volatile memory, which models the 
required cache maintenance for persistent storage in ARMv8-A [39], as an ex- 
tension to the ARMv8-A axiomatic model, and for Intel x86 [38] as an oper- 
ational model, but neither are validated against hardware. In the sequential 
case, Myreen’s JIT compiler verification [33] models x86 icache behaviour with 
an abstract cache that can be arbitrarily updated, cleared on a jmp. For ad- 
dress translation, the authoritative Arm-internal ASL model [40,41,42], and Sail 
model derived from it [11] cover this, and other features sufficient to boot an OS 
(Linux), as do the handwritten Sail models for RISC-V (Linux and FreeBSD) 
and MIPS/CHERI-MIPS (FreeBSD, CheriBSD), but without any cache effects. 
Goel et al. [21,20] describe an ACL2 model for much of x86 that covers address 
translation; and the Forvis [34] and RISCV-PLV [14] Haskell RISC-V ISA mod- 
els are also complete enough to boot Linux. Syeda and Klein [49,50] provide 
an somewhat idealised model for ARMv7 address translation and TLB mainte- 
nance. Komodo [16] uses a handwritten model for a small part of ARMv7, as 
do Guanciale et al. [25,12]. Romanescu et al. [44,43] do discuss address trans- 
lation in the concurrent setting, but with respect to idealised models. Lustig et 
al. [30] describe a concurrent model for address translation based on the Intel 
Sandy Bridge microarchitecture, combined with a synopsis of some of the rele- 
vant Linux code, but not an architectural semantics for machine-code programs. 


8 Conclusion 


The mainstream architectures are the most important programming languages 
used in practice, and their systems aspects are fundamental to the security (or 
lack thereof) of our computing infrastructure. We have established a robust 
semantics for one of those systems aspects, soundly abstracting the hardware 
complexities to a manageable model that captures the architectural intent. This 
enables future work on reasoning, model-checking, and verification for real sys- 
tems code. 
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Abstract. The precision of a static analysis can be improved by increas- 
ing the context-sensitivity of the analysis. In a type-based formulation 
of static analysis for functional languages this can be achieved by, e.g., 
introducing let-polyvariance or subtyping. In this paper we go one step 
further by defining a higher-ranked polyvariant type system so that even 
properties of lambda-bound identifiers can be generalized over. We do 
this for dependency analysis, a generic analysis that can be instantiated 
to a range of different analyses that in this way all can profit. 

We prove that our analysis is sound with respect to a call-by-name se- 
mantics and that it satisfies a so-called noninterference property. We 
provide a type reconstruction algorithm that we have proven to be ter- 
minating, and sound and complete with respect to its declarative speci- 
fication. Our principled description can serve as a blueprint for making 
other analyses higher-ranked. 


1 Introduction 


The typical compiler for a statically typed functional language will perform a 
number of analyses for validation, optimisation, or both (e.g., strictness anal- 
ysis, control-flow analysis, and binding time analysis). These analyses can be 
specified as a type-based static analysis so that vocabulary, implementation and 
concepts from the world of type systems can be reused in this setting [1924]. 
In that setting the analysis properties are taken from a language of annotations 
which adorn the types computed for the program during type inference: the anal- 
ysis is specified as an annotated type system, and the payload of the analysis 
corresponds to the annotations computed for a given program. 

Consider for example binding-time analysis [5J7]. In this case, we have a two- 
value lattice of annotations containing S for static and D for dynamic (where 
L = S C D = T, so that whenever an expression is annotated with S, it 
can be soundly changed to D, because that is a strictly weaker property). An 
expression that is known to be static may be evaluated at compile time, because 
the analysis has determined that all the values that determine its outcome are 
in fact available at compile-time while all other expressions are annotated with 
D, and must be evaluated at run-time; the goal of binding-time analysis is then 
to (soundly) assign S to as many expressions as possible. 
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Static analyses may differ in precision, e.g., a monovariant binding-time anal- 
ysis lacks context-sensitivity for let-bound identifiers (although some of it can 
be recovered with subtyping). Assuming id to be the identity function, if in the 
program 


let id z = x in..id s..idd.. 


the subexpression s is a statically known integer, which we denote as s : int(S), 
and d:int(D) a dynamic integer, then for id we arrive at int(D) — int(D), so 
that the property found for id s is that it is a dynamic integer. Clearly, however, 
if the value of s is known statically then also that of id s is! The fact that 
values with different properties flow to a function and we have to be (overly) 
pessimistic for some of these is a phenomenon sometimes called poisoning [28]. 
Context-sensitivity reduces poisoning; it can be achieved by making the analysis 
polyvariant. In that case, our type for id may become VG.int(8) — int(B), so 
that for the first call to id we may instantiate 6 with S and for the second 
choose D, essentially mimicking the polymorphic lambda-calculus at the level of 
annotations. 
But what about a function like 


foo = df. (f d,f 8) 


in which we have two calls to a lambda-bound function argument f? Can we treat 
these context-sensitively as well, so that we can have the most precise types for 
both calls, independent of each other? The answer is: yes, we can. 

Independence can be achieved by inferring for foo a type that associates with 
f an annotation polymorphic type, 


V81.(V Bo -int (Go) —_> int (34 Bo)) 


Here, 6o ranges over simple annotations (such as S and D), and 81 ranges over 
annotation level functions (in the terminology of this paper, these annotations 
are higher-sorted; see section B). The annotation variable $9 is a placeholder 
for the analysis property of the actual argument to f, while 6, represents how 
that property propagates to the value returned by f. If the identity function 
V8.int(6) — int(8) is passed to foo, a pair with annotated type int(D) x int(S) 
will be returned. This is because the types of f d and f s can be determined 
independently of each other, because the choice for Bo can be made separately 
for each call. The “price” we pay is that we have to know how the annotations on 
the values returned by f can be derived from the annotations on the arguments. 
This is exactly what (1 represents. 

If Bo or 6; would range over (annotated) types, then the underlying language 
itself would be higher-ranked, and inference in that case is known to be undecid- 
able [14]. However, as we show in this paper, if they range only over annotations 
(even higher-sorted ones), then inference may become decidable again. Why is 
that? Intuitively, this is because the underlying types provide structure to the 
analysis inference algorithm, while a higher-ranked polymorphic type system 
does not have this advantage. 
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In which situations can we expect to benefit from higher-ranked polyvari- 
ance? Generally speaking, this is when we have functions of order 2 and higher, 
functions that often show up in idiomatic functional code. 

Languages like Haskell do support higher-rank types [13]. Decidability is 
not problematic then, because the compiler expects the programmer to provide 
the higher-rank type signatures where necessary, and the compiler only needs 
to verify that the provided types are consistent: type checking is decidable. In 
our situation this is typically not acceptable: we cannot expect programmers 
to provide explicit control-flow or binding-time information. So we have to 
insist on full inference of analysis information, and this paper shows how this 
can be done for dependency analysis [I]. 

Dependency analysis is in fact a family of analyses; instances include binding- 
time analysis, exception analysis, secure information flow analysis and static 
slicing. The precision of our higher-ranked polyvariant annotated type system 
for dependency analysis thereby carries over immediately to the instances, and 
metatheoretical properties we prove, like a noninterference theorem [8], need to 
be proven only once. 

In summary, this paper offers the following contributions. We (1) define a 
higher-ranked annotation polymorphic type system for a generic dependency 
analysis (section [4) for a call-by-name language that takes its annotations from 
a simply typed lambda-calculus enriched with lattice operations (section). The 
analysis also supports polyvariant recursion [10] to improve precision for certain 
recursive functions. Due to the principled way in which the analysis is set-up it 
can serve as a blueprint for giving other analyses the same treatment. We (2) 
prove our system sound with respect to a call-by-name operational semantics. We 
also formulate and prove a noninterference theorem for our system (section bh. 
We (3) give a type reconstruction algorithm that is sound and complete with 
respect to the type system (section [6) and provide a prototype implementation 
(section [7p. For reasons of space we omit many details that are available in a 
separate document |26}. 


2 Intuition and motivation 


Before we go on to the technical details of this paper, we want to elaborate upon 
our intuitive description from the introduction. We do this by means of a few 
small examples, keeping the discussion informal. Formally discussed examples, 
as generated by our implementation, become big and hard to read pretty quickly; 
these can be found in section [7] 

We start with a few examples in which binding-time analysis is the depen- 
dency analysis instance, followed by a few examples that use security flow anal- 
ysis; our implementation supports both instances. We note that our implemen- 
tation supports a few more language constructs than the formal specification 
given in this paper, giving us a bit more flexibility. Neither, however, supports 
polymorphism at the type level. This substantially simplifies the technicalities. 

For the following example 
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foo : ((int — int) > int) > int x int 
foo = Af : (int > int) > int.(f (Ax : int.z), f (Ax : int.0)) 


our analysis can derive a higher-ranked polyvariant type for f, 


V61. (Y82.int( 82) => int (4 B2)) => int (G3 Bo G1) 


where (; and bz can be instantiated independently for each of the two calls to 
f in foo, and p3 is universally bound by foo and represents how the argument f 
uses its function argument. 

Since the argument to f is itself a function, the information that flows out 
of, say, the first call to f can be independent of the analysis of the function 
that flows into the second call (and vice versa), thereby avoiding unnecessary 
poisoning. This means that the binding-time of, say, the second component of 
the pair depends only on f and the function Az : int.0, irrespective of f also 
receiving Ax : int.c as argument to compute the first component. 

For the next example, let us consider security flow analysis in which we have 
annotations L and H that designate values (call these L-values and H-values) 
of low respectively high confidentiality. An important scenario where additional 
precision can be achieved is when analyzing Haskell code in which type classes 
have been desugared to dictionary-passing functional core. A function like 


gry=(r+y,yt+y) 


is then transformed into something like g (+) z y = («+ y,y + y). Now, 
consider the case that we pass an H-value to x and an L-value to y; the operator 
(+) produces an L-value if and only if both arguments are L-values. Without 
higher-ranked annotations, the annotation on the first argument to (+) has to be 
consistent with all uses of (+). Because x is an H-value, that will then also be the 
case for the second call to (+), leading to a pair of values of which the components 
are both H-values. With higher-ranked annotations, we can instantiate the two 
instances independently, and the second component of the pair is analyzed to 
produce an L-value. Functions in Haskell that use type classes are extremely 
common. 


3 The A-calculus 


An essential ingredient of our annotated type system is the language of anno- 
tations that we use to decorate our types and to represent the dependencies 
resulting from evaluating an expression. Indeed, the fact that annotations are 
in fact “programs” in a lambda calculus is what allows us to make our analysis 
a higher-ranked polyvariant one. For the purpose of this paper, we generalize 
the \Y-calculus of to the AY-calculus (AU for short) a simply typed lambda 
calculus extended with a lattice structure. 

The syntax of AU is given in figure |1| from now on, we refer to its types 
exclusively as sorts. Here, x ranges over sorts, 3 over annotation variables, etc. 
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k E AnnSort ::= x (base sort) 
K1 > K2 (function sort) 
B € AnnVar (annotation variables) 
€€ AnnTm™m := (variable) 
Ab 2 KE (abstraction) 
Ei 2 (application) 
£ (lattice value, £ € £) 
é& U&2 (lattice join operation) 


Fig. 1: The syntax of the \U-calculus, sorts and annotations 


In order to avoid confusion with the field of (algebraic) effects, we refer to terms 
of À} as dependency terms or dependency annotations. Terms are either of base 
sort x, representing values in the underlying lattice £, or of function sort Kı > Ke. 

On the term level, we allow arbitrary elements of the underlying lattice and 
taking binary joins, in addition to the usual variables, function applications and 
lambda abstractions. Lattice elements are assumed to be taken from a bounded 
join-semilattice L, an algebraic structure (L, U) consisting of an underlying set 
L and an associative, commutative and idempotent binary operation U, called 
join (we usually write Z € £ for £ € L), and a least element L. 

The sorting rules of AU are straightforward (see [26]). Values of the underlying 
lattice are always of sort x, and the join operator is defined on arbitrary terms 
of the same sort: 


D FeiK di Fs 9: Kk 
X Fafi Usik 


[S- JOIN] 


The sorting rule uses sort environments denoted by the letter X that map 
annotation variables Ø to sorts x. We denote the set of sort environments by 
SortEnv. More precisely, a sort environment or sort context X is a finite list of 
bindings from annotation variables 6 to sorts «x. The empty context is written 
as Ý (in code as []), and the context X extended with the binding of the variable 


V= L 
Veira = {f : Vix > Vka | f mono} 
p: AnnVar >fn U {Vx | £ E€ AnnSort} 
[4], = (8) 
[AB :: Kagh, = Av E Ver- Eloisa] 
lé £21, = [6], (él,) 
[4], =£ 
[Ue], = [61], H (él, 


Fig. 2: The semantics of A}H-calculus 
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6 to the sort « is written X, 8 : k. We denote the set of annotation variables 
in the context © with dom(2’). When we write, X(8) = « this means that 
B € dom() and the rightmost occurrence of 8 binds it to x. Moreover, X \ B 
where B C AnnVar denotes the context X where all bindings of annotation 
variables in B have been removed. In the remainder of this paper, we shall 
overload this notation for all kinds of other environments we shall be needing, 
including type environments, and annotated type environments. 

The AU-calculus enjoys a number of properties, many of which are what one 
might expect; we have put these and their proofs in [26}. 

A substitution is a map from variables to terms usually denoted by the letter 
0. The application of a substitution 0 to a term € is written 0€ and replaces all 
free variables in € that are also in the domain of # with the corresponding terms 
they are mapped to. A concrete substitution replacing the variables 6 ,...,8n 
with terms &,...,§, is written [61/31,...,&n/Gn!). 

Assuming the usual definitions for the pointwise extension of a lattice L, 
and for monotone (order-preserving) functions between lattices, Figure [2]shows 
the denotational semantics of AU, where we employ the pointwise lifting of U to 
functions to give semantics to the join of AH. The universe V,, denotes the lattice 
that is represented by the sort «x. The base sort x represents the underlying 
lattice £ and the function sort kı = ko represents the lattice constructed by 
pointwise extension of the lattice V,,, restricted to monotone functions. 

The denotation function [-],, is parameterized with an environment p of the 
given type that provides the values of variables. The denotation of a lambda 
term is simply an element of the corresponding function space. Applications are 
therefore mapped directly to the underlying function application of the meta- 
theory. This is unlike the ¥-calculus of [16] where lambda terms are mapped 
to singleton sets of functions and function application is defined in terms of the 
union of the results of individually applying each function. The crucial difference 
is that we have offloaded this complexity into the definition of the pointwise 
extension of lattices. It is therefore important to note that the join operator 
used in the denotation of a term £ U £9 depends on the sort « of this term and 
belongs to the lattice V,,. 

An environment p : AnnVar >, U{V, | x E€ AnnSort} and a sort envi- 
ronment X are compatible if dom( X) = dom(p) and for all 8 € dom(2’) we have 
p(B) € Vsp): Given two dependency terms é, and & and a sort x such that 
X F, & :« and X Fs £9: 4, we say that €2 subsumes é under the environment 
X, written X Fsub é1 E & 9, if for all environments p compatible with X, we have 
lé], E [é2],- They are semantically equal under X, written X F €; = &9, if for 
all environments p compatible with X, we have [é], = [l,- 


4 The declarative type system 


The types and syntax of our source language are given in figure |8| The types 
of our source language consist of a unit type, and product, sum and function 
types. As mentioned earlier, let-polymorphism at the type level is not part of the 
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T € Ty ::= unit (unit type) 
TET (sum type) 
TXT (product type) 
T1 > Te (function type) 
te Tm:=2 (variable) 
() (unit constructor) 
Aun: T.t (abstraction) 
ty te (application) 
(ti, t2) (pair constructor) 
proj; (t) (pair projections) 
inln, (t) | inr- (t) (sum constructors) 
case t of {inl(x) > ti;inr(y) > t2} (sum eliminator) 
pox: 7.t (fixpoint) 
seq tı te (forcing) 
anng(t) (raise annotation level to £ € £) 


Fig. 3: The types and terms of the source language 


type system. The language itself is then hardly suprising and includes variables, 
a unit constant, lambda abstraction, function application, projection functions 
for product types, sum constructors, a sum eliminator (case), fixpoints, seq for 
explicitly forcing evaluation in our call-by-name language, and, finally, a special 
operation anng(t) that raises the annotation level of t to £. We omit the underly- 
ing type system for the source language since it consists mostly of the standard 
rules (see [26]). A notable exception is the rule for ann¢(t). Such an explicitly 
annotated term has the same underlying type as t: 


TFt 


+ [U-A 
IF anng(t): T Rn 


The annotation @ imposed on t only becomes relevant in the annotated type 
system that we discuss next. In the following, we assume the usual definitions 
for computing the set of free term variables of a term, ftv(t). 


The annotated type system The source language is simply a desugared 
variant of the functional language a programmer deals with. The target language 
has the same structure, but adds dependency annotations to the source syntax. 
These annotations are the payload of the dependency analaysis and computed 
by the algorithm given in section [6] so that the analysis results can be employed 
in the back-end of a compiler. In other words, the algorithm elaborates a source 
level term into a target term. 

The syntax of the target language is shown in figure f] Annotated types of 
the target language are denoted by 7 and annotated terms are denoted by t. 
The annotations that we put on compound types, as well as their components 
are not just there for uniformity. Because of our non-strict semantics and the 
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TE Ty n= VB iu KF (annotation quantification) 
unit (unit type 
Ti (E1) + T2(&2) (sum type) 
T1(E1) X 72(2) (product type 
Ta (E1) —> T2 (£2) (function type 

feTm i=-::- 
Aa: F&EL (abstraction 
po: Feet (fixpoint 
AB: Kt (dependency abstraction 
AG ) (dependency application 


Fig. 4: The annotated types and terms of the target language 


presence of seq, we can observe the effects on a pair constructor independently 
of its values, so we have separate annotations to represent these. 

On the type level, there is an additional construct V8 :: «.7 quantifying over 
an annotation variable ( of sort «. Furthermore, the recursive occurrences in the 
sum, product and arrow types now each carry an annotation. On the term level, 
the explicit type annotations of lambda expressions and fixpoints are now an- 
notated types and also include a dependency annotation. Moreover, dependency 
abstraction and application have been added to reflect the quantification of de- 
pendency variables on the type level. We denote the set of free (term) variables 
in a target term f by ftv(f). 

The formal definition of well-formedness for annotated types can be found 
in [26]. Informally, a type is well-formed only if all annotations are of sort x and 
all annotation variables that are used have previously been bound. 

Below, we assume the unsurprising recursive definitions for computing the 
underlying terms [ft] and underlying types |F| that correspond to annotated 
terms ¢ and annotated types 7. We also straightforwardly extend the definition 
of free annotation variables to annotated types, and denote these by fav(7). 


Subtyping To define subtyping we need an auxiliary relation that says when 
two annotated types ù and 72 have the same shape. The unsurprising formal 
definition is in [26], but essentially they have the same syntactic structure, and 
in the forall case, quantify over the same annotation variable. It can be quite 
easily proven that if two types have the same shape, then they have the same 
underlying type. This is not true the other way around: the annotated types 
V61.V82.int(G1) > int (5, U 62) and V6, .int ($1) > int(6,) have the same under- 
lying type, int + int, but do not have the same shape. 

Figure [5] shows the rules defining the subtyping relation on annotated types 
of the same shape, that allows us to weaken the annotations on a type to a less 
demanding one. Intuitively, a type 7; is a subtype of 7) under a sort environment 
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——_____—- [SuB- REFL 
py, Hsub T T< z | 
ae mK - 
SF sub T < 72 2 Fsub T2 S 73 [SuB-TRANS] 


X Fsub Ti < T3 
+, B R sub Tı < To 
X Fsub VB K. < VB K.T 
X sub TI < 7 
X F sub &1 C é 


[SUB-FORALL] 


een, 
F sub T2 < T2 


sub &C — E 


— — [SUB-PROD] 
X Fsub P (E1) X To (E2) < T1(E1) X 73 (2) 
a sub A < T sub To < 7 
fle. ie 
oy sub éi = & sub éa E = & [SuB-ARR] 


=F sub Ti (E1) —> To (£2) < P (E1) > Ta (£2) 
Fig. 5: Subtyping relation (X Hsub T < 72), [SUB-SUM] is like [SUB-PROD] 


A ae 


X, written X Fsub T < 72, if a value of type 7; can be used in places where a value 
of type ™ is required. The subtyping relation only relates the annotations inside 
the types using the subsumption relation X Fgup 1 E €2 between dependency 
terms. Moreover, the subtyping relation implicitly demands that both types are 
well-formed under the environment. The [SUB-FORALL] rule requires that the 
quantified variable has the same name in both types. This is not a restriction, 
as we can simply rename the variables in one or both of the types accordingly 
in order to make them match and prevent unintentional capturing of previously 
free variables. Note that [SUB-ARR] is contravariant for argument positions. We 
omitted [SUB-SUM] which can be derived from [SUB-PROD] by replacing x with 
+: 


The annotated type rules An annotated type environment I is defined anal- 
ogously to sort environments, but instead maps term variables x to pairs of an 
annotated type 7 and a dependency term €. We extend the definition of the set 
of free annotation variables to annotated environments by taking the union of 
the free annotation variables of all annotated types and dependency terms oc- 
curring in the environment, denoted by fav(I’). We denote the set of annotated 
type environments by AnnTyEnv. 


We have now all the definitions in place in order to define the declarative 
annotated type § system shown in figure [6] It consists of judgments of the form 
» | T Fie t: F gé expressing that under the sort environment X and the 
annotated type environment T the annotated term ¢ has the annotated type T 
and the dependency term é. The dependency term in this context is also called 
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the dependency term of It is implicitly assumed that every type ? is also well- 
formed under X, i.e. X Fw T, and that the resulting dependency annotation € 
is of sort x, ie. X Fs E€: x. 

We now discuss some of the more interesting rules of figure (6 In [T-VaR], 
both the annotated type and the dependency annotation are looked up in the 
environment. The dependency annotation of the unit value defaults to the least 
annotation in [T-UNIT]. While we could admit an arbitrary dependency anno- 
tation here, the same can be achieved by using the subtyping rule [T-SuB]. We 
employ this principle more often, e.g., in [T-ABS], and [T-Patr]. This essentially 
means that the context in which such a term is used completely determines the 
annotation. 

The rule [T-ApP] may seem overly restrictive by requiring that the types 
and dependency annotations of the arguments match, and that the dependency 
annotations of the return value and the function itself are the same. However, in 
combination with the subtyping rule [T-Sus], this effectively does not restrict 
the analysis in any way. We see the same happening in other rules, such as 
[T-CASE] and [T-PrRoJ]. Note that the dependency annotation of the argument 
does not play a role in the resulting dependency annotation of the application. 
This is because we are dealing with a call by name semantics which means that 
the argument is not necessarily evaluated before the function call. It should be 
noted that this does not mean that the dependency annotations of arguments 
are ignored completely. If the body of a function makes use of an argument, the 
type system makes sure that its dependency annotation is also incorporated into 
the result. 

When constructing a pair (rule [T-Parr]), the dependency annotations of 
the components are stored in the type while the pair itself is assigned the least 
dependency annotation. When accessing a component of a pair (rule [T-PRoJ]), 
we require that the dependency annotation of the pair matches the dependency 
annotation of the projected component. Again, this is no restriction due to the 
subtyping rule. 

In [T-INL/InrR], the argument to the injection constructor only determines 
the type and annotation of one component of the sum type while the other 
component can be chosen arbitrarily as long as the underlying type matches the 
annotation on the constructor. The destruction of sum types happens in a case 
statement that is handled by rule [T-CASE]. Again, to keep the rule simple and 
without loss of precision due to judicious use of rule [T-SuB], we may demand 
that the types of both branches match, and that additionally the dependency 
annotations of both branches and the scrutinee are equal. 

The annotation rule [T-ANN] requires that the dependency annotation of 
the term being annotated is at least as large as the lattice element Z. In the 
fixpoint rule, [T-FIx], not only the types but also the dependency annotations 
of the term itself and the bound variables must match. Note that this rule also 


1 Following the literature of type and effect systems we would much like to use the 
term “effect” at this point, but decided to use a different term to avoid confusion 
with the literature on effect handlers. 
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T(x) =F &E 


= [T-VaAR] 
D| Db. c:F&E 


= — [T-UNIT] 
X | T Hie () : unit & L 
X |I t: & E Fte t: P & be 


— [T-ABs] 
D3 | I Fie AT: A &é.t 71 (E1) => T2 (£2) & L 


5 |I Hie ti : AlE) > Plé) & E 5 |I Hie t2: À & E 
X | I Hie h t2 : Te & E2 


[T-APP] 


X| heti: ARE 5 | I Hie t2 : P & Ez 


— [T-ParR] 
X | T Fre (ti, t2) : Ta (£1) X Ta (£2) & L 


X | I Hie t: P(&1) x P(E) & & 


— [T-PRoJ] 
Z |T Fte proj, (t) : Ti & & 
DIFE etid & 
| t ERAS [T-INL] 


Z| Pb te inla j (t) : P (E1) + Fo (Eo) & L 


X| I Hee t: 72 & é 


x [T-INR] 
py, | T Fee inris | t) 71 (1) + T2(E2) & L 


ms D|, r: kE Heti: PRE 
X |T Fre t: AlE) + F2(€2) & E | Dy: T2 & ate ta : T&E 


a [T-CAsE] 
X |T He case t of {inl(x) > thjinr(y) > th}: 7T&E 
X |Î He t:7& E Fub LE 
; — : hae [T-ANN] 
X | T Fre amne(t) : 7 & E 
X P, ETRE Fie ti TE 
| Do [T-FIx] 
DO | Dt pr: FEL: TRE 
S| het: Ah Z| Pete tp: & 
| ie £ | te b2 i T2 § iT-Seg] 
X |T Fte seq th 2:72 &E 
D |P Hie t: P&E DFT AT X Fabl CE 
a [T-SuB] 
X| Fiet TRE 
X, p:k |P Het:?F& fav(P) uf 
Laa z = pee ES) [T-ANNABs] 
X| T Fe AB i 6.6: VERT EE 
X |I Hie t:YB K.F? & EHER 
ass 2i [T-ANNAPP] 


D| Pb t (E): [E/B RE 


Fig. 6: Declarative annotated type system (X | I Hte t: 7 & €) 
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v ENP := At: F&Et| ABs r.t | () | inl (t) | inr-(t) | (ti, t2) 
vENf ::= v | anne(v’) 


Fig. 7: Values in the target language 


admits polyvariant recursion [23], since quantification can occur anywhere in 
an annotated type. Since seq tı t2 forces the evaluation of its first argument, 
it requires that t,’s dependency annotation is part of the final result. This is 
justified, because the result depends on the termination behavior of t4. 

The subtyping rule [T-SuB] allows us to weaken the annotations nested inside 
a type through the subtyping relation (see figure B), as well as the dependency 
annotations itself through the subsumption relation. The rule [T-ANNABS] in- 
troduces an annotation variable 8 of sort « in the body t of the abstraction. 
The second premise ensures that the annotation variable does not escape its 
scope determined by the quantification on the type level. The annotation appli- 
cation rule [T-ANNAPP] allows the instantiation of an annotation variable with 
an arbitrary well-sorted dependency term. 


5 Metatheory 


In this section we develop a noninterference proof for our declarative type system, 
based on a small-step operational call-by-name semantics for the target language. 

Figure[7|defines the values of the target language, i.e. those terms that cannot 
be further evaluated. Apart from a technicality related to annotations, they 
correspond exactly to the weak head normal forms of terms. The distinction for 
Nf’ C Nf is made to ensure that there is at most one annotation at top level. 

The semantics itself is largely straightforward, except for the handling of 
annotations. These are moved just as far outwards as necessary in order to 
reach a normal form, thereby computing the least “permission” an evaluator 
must possess for computing a certain output. Figure |8]shows two rules: a lifting 
rule (for applications) and the rule for merging adjacent annotations (see the 
supplemental material for the others). 

In the remainder of this section we state the standard progress and subject 
reduction theorems that ensure that our small-step semantics is compatible with 


v E Nf’ 
(anne(v’)) t2 > anne (v t2) 


[E-LIFTAPP] 


v € Nf’ 
anne, (anne, (v’)) > anne, ues (V 


[E-JOINANN] 


1 


Fig. 8: Small-step semantics (t > t’) (excerpt) 
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the annotated type system. The following progress theorem demonstrates that 
any well-typed term is in normal form, or an evaluation step can be performed. 


Theorem 1 (Progress). If |Qtte t:7&€, then either t E€ Nf or there is a 
t such that t > t’. 


The subject reduction property says that the reduction of a well-typed term 
results in a term of the same type. 


Theorem 2 (Subject Reduction). If |Ø Fte t:T&€ and there is a t' such 
that t > t', then O| Obie Ui: T&E. 


As expected, subject reduction extends naturally to a sequence of reductions 
by induction on the length of the reduction sequence: 


Corollary 1. If we have | OF te t: T&E and t >* v, then O| Ø Fre vu: T&E. 


where, as usual, we write t * v if there is a finite sequence of terms (t; )o<i<n 
with to = t and t, =v € Nf and reductions (t; > ti+1)o<icn between them. If 
there is no such sequence, this is denoted by t fù and t is said to diverge. 

Finally, if a term evaluates to an annotated value, this annotation is com- 
patible with the dependency annotation that has been assigned to the term: 


Theorem 3 (Semantic Soundness). If we have | Otte t:7 & € and t * 
anny(v’), then Ó Fsub LE È. 


The noninterference property An important theorem for the safety of pro- 
gram transformations/optimizations using the results of dependency analysis is 
noninterference. It guarantees that if there is a target term t depending on some 
variable x such that Ø | 7:7’ &€' Fre t:T&E holds and the dependency annotation 
€ of the variable is not encompassed by the resulting dependency annotation 
E (Le. Ø Feup E Z £), then t will always evaluate to the same normal form, 
regardless the value of z. 

Since we are in a non-strict setting, our noninterference property only applies 
to the topmost constructors of values. This is because the dependency annota- 
tions derived in the annotated type system only provide information about the 
evaluation to weak head normal form. Nested terms might possess lower as well 
as higher classifications. In particular, the subterms with greater dependency 
annotations than their enclosing constructors prevent us from making a more 
general statement because those can still depend on the context whereas the top- 
level constructor cannot. In the noninterference theorem presented for the SLam 
calculus, this problem is circumvented by restricting the statement to so called 
transparent types, where the annotations of nested components are decreasing 
when moving further inward [9]. 

In the following we consider two normal forms v1, v2 E€ Nf to be similar, de- 
noted vı ~ va, if their top level constructors (and annotations, if present) match 
(see the supplemental material for the unsurprising definition of ~). So, vı ~ v2 
implies that these two values are indistinguishable without further evaluation, 
which is the property guaranteed by the noninterference theorem. 
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Theorem 4 (Noninterference). Let t be a target term such that 0 | 2:7’ & 
E' Hte t:F& E and Ot eup E Z £. Let v be a value. 

If there is a tı with Ọ | Otte ti: T&E such that [ti /x|t —* v, then there is 
at’ such that for all to with 0| Ob te ta: 7 & & we have |tz /x|t >* [t2 / alt’ 
and |ti / alt! ~ [te / zt. 


The noninterference proofs crucially rely on the fact that the source term 
is well-typed, and the additional assumption Ø Fsup € Z € stating that the 
dependency annotation of the variable in the context is not encompassed by the 
dependency annotation of the term being evaluated. 

By introducing the restriction to transparent types, we can recover the no- 
tion of noninterference used for the SLam calculus. For example, if we have a 
transparent type Tı (£1) x Ta(€2) & € (i.e. Ø Fsub 1 E € and Ø Fsub £2 E €) and 
Ø Fsup €’ Z € holds, then we also know Ú Fsub €’ Z ¿1 and Ø Fup E Z £2. Other- 
wise, we would get Ø Fsub E E € by transitivity, contradicting the assumption. 
This means all prerequisites of the noninterference theorem are still fulfilled. 

Hence, it is possible in these cases to apply the noninterference theorem to 
the nested (possibly unevaluated) subterms of a constructor in weak head normal 
form. As in the work of [I], our noninterference theorem is restricted to deal with 
terms depending on exactly one variable. 


6 The type reconstruction algorithm 


Modularity considerations When designing the type reconstruction algo- 
rithm we have two goals: it should be a conservative extension of the underlying 
type system, and types assigned by the analysis should be as general as possible. 
Concretely, a function’s type must be general enough to be able to adapt to 
arguments with arbitrary annotations. These two goals give rise to the notion 
of fully flexible and fully parametric types defined by [12]. calls these types 
conservative and pattern types respectively. Informally, an annotated type is a 
pattern type if it can be instantiated to any conservative type of the same shape 
and a conservative type is an analysis of an expression that is able to cope with 
any arguments it might depend on. These types are conservative in the sense 
that they make the least assumptions about their arguments and therefore are a 
conservative estimate compared to other typings with fewer degrees of freedom. 

For a pattern type to be instantiable to any conservative type, we first need 
to make sure that all dependency annotations occurring in it can be instantiated 
to the corresponding dependency terms in a matching conservative type. This 
leads to the following definition of a pattern in the \“-calculus. It is based 
on the similar definition by which in turn is a special case of a pattern 
in higher-order unification theory [4J21]. A Av-term is a pattern if it is of the 
form f 61 -++ Bn where f is a free variable and (),..., 8, are distinct bound 
variables. A unification problem of the form V(,---By.f B1---B8n = € where 
the left-hand side is a pattern is called pattern unification. A pattern unification 
problem VG, ---8n.f B1- Bn = € has a unique most general solution, namely 
the substitution [f > A6,.---ABn-€] A. 
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= ae [P-UNIT] 
Qi Ka, Fp unit & 8 Qie Bs: Ka, > * 
Oi Ka; Fp 71 & €1 > By i Kg, Oi Ra; Fp T2 & E2 D We Ry, 
= = [P-PRop] 
Gi Ka; Fp Ti (£1) X T2(&2) & B TiD Bs: Ra; => *, By Kej, Yk = Ryp 
Ó Fp Tr & 1 > Bj: “6; E Kai, Bj 1 BB; Fp Tz & E2 D Yk Ryp [P-ARR] 
Oi Ra; Fp VB; :: Kg; -T1(E1) > T2(€2) & B OGD Bs Ra; => x, Tks Ry, 


Fig. 9: Pattern types (X Fp T& E> X’), where B ¢ aj, bj, Yk, and [P-SuM] is like 
[P-PROD] 


The definition of a pattern is then extended to annotated types using the rules 
from figure [9 Our definition is more precise than the one from previous work in 
that it makes explicit which variables are expected to be bound and which are 
free. We require that all variables with different names in the definition of these 
rules are distinct from each other. 

An annotated type and depencency pair 7 & € is a pattern type under the 
sort environment X if the judgment X +, 7 & €> X holds for some X’. We call 
the variables in X argument variables and the variables in X’ pattern variables. 


Example 1. A simple pattern type with the pattern variables 8 :: x = x and 
Bluxk>* >is 


VB, 2: xunit (81) > (VB 1: *.unit(B2) > unit(B’ 8, Bo))(B B1) 


Note that since 6; is quantified on the function arrow chain, it is passed on to the 
second function arrow. However, it is not propagated into the second argument. 
In general, annotations on the return type may depend on the annotations of all 
previous arguments while annotations of the arguments may not. This prevents 
any dependency between the annotations of arguments and guarantees that they 
are as permissive as possible. This is also why pattern variables in a covariant 
position are passed on to the next higher level while pattern variables in argu- 
ments are quantified in the enclosing function arrow. This allows the caller of 
a function to instantiate the dependency annotations of the parameters to the 
actual arguments. 


As we stated earlier, a conservative function type makes the least assumptions 
over its arguments. Formally, this means that arguments of conservative func- 
tions are pattern types. We will later see that a pattern type can be instantiated 
to any conservative type of the same shape. On the other hand, non-functional 
conservative types are not constrained in their annotations. These characteris- 
tics are captured by the following definition based on conservative types [I6] and 
fully flexible types |12]. 

An annotated type 7 is conservative if 
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fresh 
a [C-UNIT] 
Qi Ra; Fe unit: unit & 8 aiD Bs: Ka, => * 
Qin Ra; Fe 71:71 & E D By 2: Kg Qi Ra; Fe T2 : To & E2 D Ye Ry, 
1:71 & 1b Bj Ke, 2172 & E2 D We Ry [C-Prop] 
Oi Ka, Fe Ti X T2 : AlE) X Tela) & B UP Bs: Fa, => x, Bj = KB, KB; Yk = Ryp 
Deke m1 2:71 & 4 > By ER Oat: Raz, j 1: Kg; Fe T2 : T2 & E2 D 
1:7, & éD pj: Kg; Bi = Kp; Fo T2 : Po & E2 D Yr Ry, (C-ARR] 
Qi? Ra; Fe T1 > T2 : YB; 2: Kg, m > hlé) & 8 TiD B:: Ra; x, Yk o Ry, 


Fig. 10: Type completion (X Fe r:T&E>X"), all 8 fresh, [C-SuM] is like [C-PROD] 


E) 


1. F = unit, or 

2.7=7 (£) + 72(€) and both 7; and 72 are conservative, or 

3. T= T(E) X T(€g) and both 7, and 7) are conservative, or 

4.7= VB; i Ky -T1(E1) => To lEz) and both (a ) 0 ee T & é > Bj i Kj : Kj and (b) To is 
conservative. 


Moreover, an annotated type and depencency pair T & € is conservative if T 
is conservative and an annotated type environment I’ is conservative if for all 
x €dom(I), T(x) is conservative. 

The following type signature for the function f is a conservative type that 
takes the function type from example [I] as an argument. 


FEVE ik > VOT x= x => KB iia 


(V61 i x. unit(b1) > (V82 :: x. ae — unit ("81 B2))(B B1))(Bs) 
— unit (83 U B LUB Le) & I 


Note that the pattern variables of the argument have been bound in the 
top-level function type. This allows callers of f to instantiate these patterns. 

We can extend the previous definition of pattern types to the type completion 
relation shown in figure[10 It relates every underlying type 7 with a pattern type 
F such that 7 erases to 7. It is defined through judgments X Fe 7:7 &€b X’ with 
the meaning that under the sort environment X, T is completed to the annotated 
type 7 and the dependency annotation € containing the pattern variables X”. 
The completion relation can also be interpreted as a function taking X and 7 as 
arguments and returning 7, € and X”. 

Lastly, we revisit the examples from the previous sections and show how a 
pattern type can be mechanically derived from an underlying type. 

In example [7] we presented a pattern type for the underlying type unit —> 
unit > unit. Using the type completion relation, we can derive the pattern type, 


(Vy -unit (31) —> (VB2-unit(2) + unit (8 81 B2))(B B1)) & Bs 


without having to guess. This is because the components 7, € and X” in a judg- 
ment X He 7:7 & €> SX” are uniquely determined by X and 7 from looking at 


672 F. Thorand and J. Hage 


the syntax alone. The resulting pattern type contains three pattern variables, 
Bik => x, Blk => x => x and b; :: x. If the initial sort environment is empty, 
these are also the only free variables of the pattern type. 

Based on the type completion relation we can define least type completions. 
These are conservative types that are subtypes of all other conservative types of 
the same shape. Therefore, all annotations occurring in positive positions on the 
top level function arrow chain must also be least. We do not need to consider 
arguments here because those are by definition equal up to alpha-conversion due 
to being pattern types. We define the least annotation term of sort k as 


Iy 
lig heer ADS Kiel sd 


These least annotation terms correspond to the least elements of our bounded 
lattice for a given sort «x. This in turn leads us to the definition of the least 
completion of type T (see figure by substituting all free variables in the 
completion with the least annotation of the corresponding sort, i.e. 


Le = [Lery BJF for OF. 7: FRED By: Ky. 


The algorithm We can now move on to the type reconstruction algorithm that 
performs the actual analysis. At its core lies algorithm R shown in figure 
The input of the algorithm is a triple (P , X,t) consisting of a well-typed source 
term t, an annotated type environment T providing the types and dependency 
annotations of the free term variables in t and a sort environment X mapping 
each free annotation variable in scope to its sort. It returns a triple t:F &E 
consisting of an elaborated term T in the target language (that erases to the 
source term t), an annotated type 7 and an dependency annotation € such that 
X| F Hie t:F & € holds. In the definition of R, to avoid clutter, we write [ 
instead of I’ because we are only dealing with one kind of type environment. 

The algorithm relies on the invariant that all types in the type environment 
and the inferred type must be conservative. In the version of [I6], all inferred 
dependency annotations (including those nested as annotations in types) had 
to be canonically ordered as well. But as it turned out that this canonically 
ordered form was not enough for deciding semantic equality, so we lifted this 
requirement. We still mark those places in the algorithm where canonicalization 
would have occurred with ||- |., but the actual result of this operation does not 
matter as long as the dependency terms remain equivalent. 

The algorithm for computing the least upper bound of types (U in figure [12) 
requires that both types are conservative, have the same shape and use the same 
names for bound variables. The latter can be ensured by a-conversion while the 
former two requirements are fulfilled by how this function is used in R. 

The restriction to conservative types allows us to ignore functions arguments 
because these are always required to be pattern types, which are unique up to 
a-equivalence. This alleviates the need for computing a corresponding greatest 
lower bound of types, because the algorithm only traverses covariant positions. 
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R:AnnTyEnv x SortEnv x Tm > Tm x Ty x AnnTm 


R(T; Y;2) =2:I(x) 
R(T; X; ()) = () : unit & L 
R(T; X; anne(t)) = 

let t: F & € = R(I; X; t) 

in anne (®t) :F& léu Ljs 
R(T; X; seq th t2) = 

let f: A && = R(T; X; t1) 

B: P & éz = R(T; X; t2) 

in seq f R: P & | Us 

R(T; X; (t, t2)) = 


let h: A & & = R(L; X; tr) 
b : Te & &o = R(T; X; t2) 


in (fi, t2) : P (E1) x 72 (E2) & L 
R(T; X; inl, (t)) = 

let t: À & é = R(T; X; t) 

in inln À : Al) + LaL) &L 
R(T; X; inr, (¢)) = 

let T: P & & = R(T; X; t) 

in inr,, (¢) : La (L) +l) & L 
R(T; X; case tı of {inl (z) > tz; 

inr(y) > t3}) = 


let fi : P(E) +P (E) & & = R(T; X; t) 
B: P & é = R(T, £ :F & E; X; t2) 
ig : P & és = R(T, y: PF & ë; X; ta) 


: [UTIs & |é Uo Ues| s 
R(T; X; proj, (t)) = 
let T: 71 (€1) x 72(€2) & E = R(L; X; t) 
in proj, (t) : F & (EU éJ s 
R(I; X; At : T.t) = 
let À & Bp Bi: Ki = C([]; T1) 
V= hz: A&B 


SY =X, Bink 
T: 72 & E2 = R Det t) 
in AB: ri ATT & BE 
:VBi i Ki-T1(B) > Polé) & L 
R(T; X; ty t2) = 
let f : A & &: = R(T; X; ti) 
B: P & é = R(T; X; t2) 
Pa (B) + P(E) > Bi = T(7) 


0 = [B > £2] 0 M([]; Fa; T2) 
in fi (081) 2: LOT] > & Lé U OE] = 
R(T; X; wx: 7.t) = 
do i;7) & £o + 0; 1, & L 
repeat tı Tid & E441 
+ RI, 2: Ti & &; X; t) 
ae itl 
until (Fi-1 = Ti A &i-1 = ĉi) 
return (uz Pi & Eiti) TRE 


in case t of {inl(x) > tz; inr (y) > t } 


Fig. 11: Type reconstruction algorithm (R) 


The handling of A-abstractions uses the type completion algorithm C of fig- 
ure [12] 12| that defers its work to the type completion relation defined earlier which 
can be interpreted in a functional way (see figure [10}. The underlying type of the 
function argument is completed to a pattern type. The function body is analyzed 
in the presence of the newly introduced pattern variables. Note that this pattern 
type is also conservative, thereby preserving the invariant that the context only 
holds conservative types. The inferred annotated type of the lambda abstraction 
universally quantifies over all pattern variables and the quantification is reflected 
on the term level through annotation abstractions AG :: K.t. 

In order to analyze function applications, we need two more auxiliary al- 
gorithms. The first one is the instantiation procedure Z (see figure which 
instantiates all top-level quantifiers with fresh annotation variables. The second 
is the matching algorithm M (see figure which instantiates a pattern type 
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u:Ty x Ty > Ty 
unit L unit = unit 

(71 (&1) X Po (E2)) U (P (E1) x 72 (E2)) = (Ta UPI) (Er U E1) x (T2 U 72) (£2 U £3) 
(71(B) + To (€2)) U (A (B) => Ta (E2)) = T (8) > (72 U P3) (Ea U £3) 

(VB: K.T) HERT] =WerkTUT 


C : SortEnv > Ty x AnnTm x SortEnv 
C(x; T) = FRED Bi ki ki Where X Fe T:?& ED Bi ki 


L: Ty > Ty x SortEnv 
I(YB i: K.T) = let 7 > X = T(P) in [8 > B'](7") > p’ k, X where p’ be fresh 
L(r) =7?[] 


M : SortEnv x Ty x Ty — AnnSubst 
M(Z;unit; unit) — = [] 
M(2371(B Bi) x P(B Bi); AlE) X 72 (Ea) = 
[8 = Abi © 3(8:).£1, B’ OH ABi ss |(Bi).€2] 0 ME; 71; 71) 0 M(2; Ph; 72) 
M(2371(8) — 73(8" Bi); AB) > PRE) = 
[B’ = AB: = Y(Bi).€] 0 M(Z; 73572) 
M(X; YB :: K.F; VB KF) = M(X, Bi Ks 757) 


Fig. 12: Least upper bound of types (U), completion (C), instantiation (Z), and 
matching (M). Rules for - +- in U and M are like those for - x 


with a conservative type of the same shape. It returns a substitution obtained 
by performing pattern unification on corresponding annotations. 


Soundness and Completeness An annotated type environment Î is well- 
formed under an environment X, if Ī is conservative and for all bindings 7:7 &€ 
in I we have X Hwt F and X Fs Erik. 

In order to demonstrate the correctness of the reconstruction algorithm pre- 
sented in this section we have to show that for every well-typed underlying term, 
it produces an analysis (i.e. annotated types and dependency annotations) that 
can be derived in the annotated type system (see figure 6). That is to say, algo- 
rithm R is sound w.r.t. the annotated type system. 


Theorem 5. Let t be a source term, X a sort environment and T an annotated 
type environment well-formed under X such that R(T: 3; t) =#t:7 &€ for some 
T, F and È. 

Then, X | T Het: F & E, L Fw T, X Fs €:* and? is conservative. 


The next step is to show that our analysis succeeds in deriving an annotated 
type and dependency annotation for any well-typed source term: it is complete. 
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The crucial part here is the termination of the fixpoint iteration. In order to show 
the convergence of the fixpoint iteration, we start by defining an equivalence 
relation on annotated type and depencency pairs. 

Our type reconstruction algorithm handles polymorphic recursion through 
Kleene-Mycroft-iteration. Such an algorithm is based on fixpoint iteration and 
needs a way to decide whether two dependency terms are equal according to the 
denotational semantics of AU. 

A straightforward way to decide semantic equivalence is to enumerate all 
possible environments and compare the denotations of the two terms in all of 
these (possibly after some semantics preserving normalization). This only works 
if the dependency lattice £ is finite. 

For some analyses, e.g., the set of all program locations in a slicing analysis, 
L = V, is finite but large, and deciding equality in this fashion becomes imprac- 
tical. To alleviate this problem, our prototype implementation applies a partial 
canonicalization procedure which, while not complete, can serve as an approxi- 
mation of equality: if two canonicalized dependency terms become syntactically 
equal, then we can be assured that they are semantically equal, but if they are 
not we can still apply the above procedure to the canonicalized dependency 
terms. We omit formal details from the paper. 

We can now state our completeness results for the type reconstruction al- 
gorithm. Here, we write J. +, t:7 to say that term t has type 7 under the 
environment I’ in the underlying type system. 


Theorem 6 (Completeness). Given a source term t, a sort environment X, 
an annotated type environment T well -formed under X, and an underlying type 
T such that all Hi t:7, then there are t, F and € such that R(T: Dt) =t:7 KE 
and |F] =r, |t] = t. 


As a corollary of the foregoing theorems, our analysis is a conservative ex- 
tension of the underlying type system. 


Corollary 2 (Conservative Extension). Lett be a source term, T be a type 
and I’ a type environment such that 4 t:7. Then aie are X, I, T, T, € such 


that X | DH te t: 7 & € with |t] = t, |F] =T and I] = 


7 Implementation and Examples 


Beyond the definition of the annotated system and the development of the associ- 
ated algorithm and meta-theory we also have a REPL prototype implementation 
of our analysis in Haskell. Compared to the annotated type system in the paper, 
the prototype provides support for booleans and integers, including literals and 
conditionals if c then tı else tọ for which the type rules can be straightfor- 
wardly derived. Concrete lattice implementations are provided only for binding- 
time analysis and security analysis, but the reconstruction algorithm abstracts 
away from the choice for a particular lattice, so it is easy to add new instances. 


The implementation is available at http://www.staff.science.uu.nl/~hage0101/ 
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prototype-hrp.zip, Below we walk through a few examples, taking advantage of 
the slightly extended source language that our implementation supports. More 
(detailed) examples are discussed in [26]. 


Construction and Elimination Whenever something is constructed, be it a 
product, a sum or a lambda abstraction, the outermost dependency annotation 
is L. This is because the analysis aims to produce the best possible and thereby 
least annotations for a given source program. 

Consider the case of binding-time analysis, and suppose we have a variable of 
function type f :VG.int(G) > int(3) &D. We can see that it preserves the annota- 
tions of its arguments, i.e. if we apply f to a static value, the return annotation 
is also instantiated to be static. The function itself, however, is dynamic. And 
therefore, the whole result of the function application must also be dynamic, 
because we cannot know which particular function has been assigned to f. 

As elimination always introduces a dependency in the program, and this can 
uncover subtleties arising when functions only differ in their termination behav- 
ior. For example, compare Ap : int x int.p with Ap : int x int.(proj,(p), projo(p)). 
In a call-by-value language, these two functions would be (extensionally) equiv- 
alent. However, with non-strict evaluation, p might be a non-terminating com- 
putation. In that case, applying the former function would diverge, while the 
latter function at least produces the pair constructor. This is also reflected in 
the annotated types that are inferred. For the former, we get 


Vbo, b1, B2 1: x.(int (Bo) x int(G1)) (82) —> (int(Bo) x int(B1)) (G2) & S, and 
Vbo, 81, Bz: x.(int (Bo) x int(G1)) (82) > (int(Bo U B2) x intB, U B2))(S) & S 


for the latter. In particular, the annotation of the product in the second type 
signature is S. Therefore, it can not depend on the input of the function. 


Polymorphic Recursion One class of functions where the analysis benefits 
from polymorphic recursion are those that permute their arguments on recursive 
calls. Our example is a slightly modified version of an example from [5]: 


uf : bool + bool + bool.Ax : bool.Ay : bool. if z then true else f y x 


In an analysis with monomorphic recursion, the analysis assigns the same anno- 
tation to both parameters, large enough to accommodate for both arguments. 
This is due to the permutation of the arguments in the else branch. An analysis 
with polymorphic recursion is allowed to use a different instantiation for f in 
that case. Our algorithm hence infers the following most general type. 


Vy :: .bool(B1) — (Vo :: *.bool(Bz) — bool(B; U B2))(L) & L 


We see that the result of the function indeed depends on the annotations of 
both arguments, as both end up in the condition of the if-expression at some 
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point. Yet, both arguments are completely unrestricted, and unrelated in their 
annotations. In contrast, a type system with monomorphic recursion would only 
admit a weaker type, possibly similar to 


Vy :: *.bool(B1) — (bool(B,) — bool(B,))(L) & L 


A real world example of this kind is Euclid’s algorithm for computing the 
greatest common divisor(see [26]). 


Higher-Ranked Polyvariance This section discusses several examples for the 
dependency analysis instance of binding time analysis, comparing our outcomes 
with a let-polyvariant analysis[29]. 

A simple example to start with is a function that applies a function to both 
components of a paii] 


both : (int > int) > int x int > int x int 
both = Af : int > int.Ap : int x int.(f (proj, (p)), f (proje(p))) 


Suppose in the context of binding-time analysis that both is used to apply a 
statically known function to a pair whose first component is always computable 
at compile time, but whose second component is dynamic. For simplicity’s sake, 
the function is the identity on integers. 


id : int > int 
id = Ax : int.x 


A non-higher-ranked analysis would assign types to both and id. The anno- 
tation on the function argument to both must be large enough to accommodate 
both components of the pair as input. When we consider the call both id p for 
some pair p:int(S) xint(D) &S. Then, the whole call has the type int (D) xint(D). 

Our higher-ranked analysis infers the following conservative types for id and 
both. 


id : YB :: x.int(G) > int(B) & L 
id = AP :: x. Ax : int & 8.x 


both : YBı :: x.V Bg :: => x. (V2 :: x.int (8) > int (82 BY) (61) 
— (V83, Ba, Bs :: x.(int (83) x int (84) ) (Bs) 
— (int (62 (83 U Bs) U 61) x int(B2 (84 U Bs) O B1))(S))(S) & 
both = AB, :: *.ABg :: x = x Af : (V :: x.int(B) > int(B B)). 
ABs :: e n Abı: A a ABs :: i Ap: int (33) x int(84). 
(f (B3 U Bs) (projı(p)), f (B4 U Bs) (proja(p))) 


In case of both, the function parameter f can be instantiated separately for each 
component because our analysis assigns it a type that universally quantifies over 


? NB. both is a simplified instance of a traversal Vf.Applicative f > (Int > f Int) > 
(Int, Int) > f (Int, Int), in order to fit the restrictions of the source language [615]. 
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the annotation of its argument. It is evident from the type signature that the 
components of the resulting pair only depend on the corresponding components 
of the input pair, and the function and the input pair itself. They do not depend 
on the respective other component of the input. 

If we again consider the call both id p, we obtain b2 = AG :: x.B, By = b3 = 
85 = S and $4, = D through pattern unification. Normalization of the resulting 
dependency terms results in the expected return type int(S) x int(D). 

The generality provided by the higher-ranked analysis extends to an arbitrar- 
ily deep nesting of function arrows. The following example demonstrates this for 
two levels of arrows. Functions with more than two levels of arrows can arise 
directly in actual programs, but even more so in desugared code, e.g., when type 
classes in Haskell are implemented via explicit dictionary passing. Due to limi- 
tations of our source language, the examples are syntactically heavily restricted. 

Consider the following function that takes a function argument which again 
requires a function. 


foo : ((int — int) > int) > int x int 
foo = Xf : (int > int) > int.(f (Ax : int.x), f (Ax : int.0)) 


The higher-ranked analysis infers the following type and target term (where we 
omitted the type in the argument of the lambda term because it essentially 
repeats what is already visible in the top level type signature). 


foo : YBa 2: xV B3 x => (* >x) >. 

(Vo :: x.VY B1 i x => x. (Vbo :: x.int(bo)} > int(3, B9))(B2) 

— int(b3 B2 b1)) (b4) 

— (int(83 S (ABs :: x.85) U 84) x int(83 S (A6 :: *.S) U 84) ) (S) & S 
foo = Ab4 :: x. ABs :: x => (kK > x) SKA ee. 

(f (S) (Abo :: *-B0) (ABs :: *-Ax : int & 85.2) 

f (S) (Abo 2: *-S) (ABe :: x.Ax : int & Bg.1)) 


Since the type of f is a pattern type, the argument to f is also a pattern type by 
definition. Therefore, the analysis of f depends on the analysis of the function 
passed to it. This gives rise to the higher-order effect operator B3 [I2]. Thus, f 
can be applied to any function with a conservative type of the right shape. As our 
algorithm always infers conservative types, the type of f is as general as possible. 
This is reflected in the body of the lambda where in both cases f is instantiated 
with the dependency annotation corresponding to the function passed to it. The 
result of this instantiation can be observed in the returned product type where 
{3 is applied to the effect operators Aĝo :: *.89 and Ao :: *.S corresponding to 
the respective functions used as arguments to f. 

Only when we finally apply foo, the resulting annotations can be evaluated. 


bar : Vag :: «Vay :: x => x.(Vao :: x-int(ag) > int(az ao))(a2) 
> int(ay DU a2) & S 
bar = Aas :: *. Aa, :: x => x. Af +++ .f (annp(0)) 
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For bar we obtain foo bar : int(D) x int(S) & S. In this case, 83 = Ab :: 
*.ABy =: x => *.8, DU 62, because bar applies its argument to a value with 
dynamic binding time. This causes the first component of the returned pair to 
be deemed dynamic as well. On the other hand, in the second component bar 
is applied to a constant function. Thus, regardless of the argument’s dynamic 
binding time, the resulting binding time is static. In a rank-1 system we would 
get int(D) x int(D) instead of int(D) x int(S). 


8 Related Work 


The basis for most type systems of functional programming languages is the 
Hindley-Milner type system [22]. Our algorithm R strongly resembles the well- 
known type inference algorithm for the Hindley-Milner type system, Algorithm 
W [B], a distinct advantage of our approach. The idea to define an annotated 
type system as a means to design static analyses for higher-order languages is 
attributed to [I9]. The major technical difference compared to a let-polyvariant 
analysis is that our annotations form a simply typed lambda-calculus. 

Full reconstruction for a higher-ranked polyvariant annotated type system 
was first considered by in the context of a control-flow analysis. However, 
we found that the (constraint-based) algorithm as presented in [12] generates 
constraints free of cycles. Therefore, it cannot faithfully reflect the constraints 
necessary for the fixpoint combinator. The algorithm incorrectly concludes for 
the following example that only the first and third ‘False’ term flow into the 
condition x, but not the second one. 


(fia (Af. Av. Ay. Az. if z then True else f z x y)) False False False 


We reproduced this mistake with their implementation and verified that the 
mistake was not a simple bug in that implementation. 

Close to our formulation is the (unpublished) work of [16] which deals with 
exception analysis, which uses a simply typed lambda-calculus with sets to repre- 
sent annotations. We have chosen a more modular approach in which we offload 
much of the complexity of dealing with lattice values to the lattice. In terms 
from the simply typed lambda-calculus with sets are canonicalized and then 
checked for alpha equivalence during Kleene-Mycroft iteration. We found how- 
ever that two terms can have different canonical forms even though they are 
actually semantically equivalent. This causes Koot’s reconstruction algorithm 
to diverge on a particular class of programs, because the inferred annotations 
continue to grow. The simplest such program we found is the following. 


uf : (anit > unit) > unit > unit.Ag : unit > unit. Ax : unit.g (f g x) 


Our solution is to apply canonicalization to simplify terms as much as pos- 
sible, and then compare the outcomes for all possible inputs. 

The Dependency Core Calculus was introduced by as a unifying frame- 
work for dependency analyses. Instances include binding-time analysis (see, e.g., 
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[29]), exception analysis [I7[16], secure information flow analysis [9] and static 
slicing [27]. They devised the Dependency Core Calculus (DCC) to which each 
instance of a dependency analysis can be mapped. This allowed them to compare 
different dependency analyses, uncover problems with existing instance analy- 
ses and to simplify proofs of noninterference [820]. The instance analyses in 
were defined as a monovariant type and effect system with subtyping, for a 
monomorphic call-by-name language. An implicit, let-polymorphic implementa- 
tion of DCC, FlowCaml, was developed by [25]. It is not higher-ranked. 

The difference between DCC and our analysis is to a large extent a different 
focus: the DCC is a calculus defined in a way that any calculus that elaborates 
to DCC has the noninterference property and any other properties proven for 
the calculus. On the other hand, our analysis is meant to be implemented in a 
compiler (with the added precision), and that implementation (and its associated 
meta-theory) can then be reused inside the compiler for a variety of analyses. 
Comparable to DCC, we have proven a noninterference property for our generic 
higher-rank polyvariant dependency analysis, so that all its instances inherit it. 

The Haskell community supports an implementation of DCC in which the 
(security) annotations are lifted to the Haskell type level [2]. Since the GHC 
compiler supports higher-rank types, the code written with this library can in 
fact model security flows with higher-rank. Because of the general undecidability 
of full reconstruction for higher-rank types [14], the programmer must however 
provide explicit type information. In [I8], the authors introduce dependent flow 
types, that allows them to express a large variety of security policies. An essential 
difference with our work is that our approach is fully automated. 

Early on in our research, we observed that the approach of [[1] may lead to 
similar precision gains as higher-ranked annotations do. Since they deal with a 
different analysis, a direct comparison is impossible to make at this time. 


9 Conclusion and Future Work 


We have defined a higher-rank annotation polymorphic type system for a generic 
dependency analysis, established its soundness and provided a sound and com- 
plete reconstruction algorithm. Examples show that we can achieve higher pre- 
cision than plain let-polyvariance. The analysis we have defined is for a call-by- 
name language. We expect the results to hold as well for a lazy language, but 
chose call-by-name for reduced bookkeeping in the proofs. We also believe the 
analysis can be adapted relatively easily to one for a call-by-value language, by 
letting the annotation on the argument flow into the effect of the call. However, 
we would need to re-examine the metatheory. 

In future work we want to consider whether we can further refine the canon- 
icalization of AU terms so that syntactic equality up to alpha-equivalence can 
completely replace our current approach. 
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Abstract. We present CONSORT, a type system for safety verification 
in the presence of mutability and aliasing. Mutability requires strong 
updates to model changing invariants during program execution, but 
aliasing between pointers makes it difficult to determine which invariants 
must be updated in response to mutation. Our type system addresses 
this difficulty with a novel combination of refinement types and fractional 
ownership types. Fractional ownership types provide flow-sensitive and 
precise aliasing information for reference variables. CONSORT interprets 
this ownership information to soundly handle strong updates of potentially 
aliased references. We have proved CONSORT sound and implemented a 
prototype, fully automated inference tool. We evaluated our tool and found 
it verifies non-trivial programs including data structure implementations. 


Keywords: refinement types, mutable references, aliasing, strong up- 
dates, fractional ownerships, program verification, type systems 


1 Introduction 


Driven by the increasing power of automated theorem provers and recent high- 
profile software failures, fully automated program verification has seen a surge 
of interest in recent years [5, 10, 15, 29, 38, 66]. In particular, refinement types 
[9, 21, 24, 65], which refine base types with logical predicates, have been shown to 
be a practical approach for program verification that are amenable to (sometimes 
full) automation [47, 61, 62, 63]. Despite promising advances [26, 32, 46], the sound 
and precise application of refinement types (and program verification in general) 
in settings with mutability and aliasing (e.g., Java, Ruby, etc.) remains difficult. 

One of the major challenges is how to precisely and soundly support strong 
updates for the invariants on memory cells. In a setting with mutability, a single 
invariant may not necessarily hold throughout the lifetime of a memory cell; while 
the program mutates the memory the invariant may change or evolve. To model 
these changes, a program verifier must support different, incompatible invariants 
which hold at different points during program execution. Further, precise program 
verification requires supporting different invariants on distinct pieces of memory. 


© The Author(s) 2020 
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1mk(n) { mkref n } 1 loop(a, b) { 
2 let aold = *a in 
3 let p = mk(3) in 3 b:= *b + 1; 
a let q = mk(5) in 4 a:= *a + 1; 
5p := *p + 1; 5 assert(*a = aold + 1); 
6q := *q + 1; 6 if x then 
7 assert (*p = 4); 7 loop(b, mkref x) 
s else 
9 loop(b,a) 


Fig. 1. Example demonstrating the dif- 

ficulty of effecting strong updates in the 

presence of aliasing. The function mk is 

bound in the program from lines 3 to 7; 

its body is given within the braces. Fig. 2. Example with non-trivial alias- 
ing behavior. 


10 } 


11 loop(mkref x, mkref x) 


One solution is to use refinement types on the static program names (i.e., 
variables) which point to a memory location. This approach can model evolving 
invariants while tracking distinct invariants for each memory cell. For example, 
consider the (contrived) example in Figure 1. This program is written in an ML- 
like language with mutable references; references are updated with := and allo- 
cated with mkref. Variable p can initially be given the type {v:int | v = 3} ref, 
indicating it is a reference to the integer 3. Similarly, q can be given the type 
{v:int |v = 5}ref. We can model the mutation of p’s memory on line 5 by 
strongly updating p’s type to {v : int |v = 4} ref. 

Unfortunately, the precise application of this technique is confounded by the 
existence of unrestricted aliasing. In general, updating just the type of the mutated 
reference is insufficient: due to aliasing, other variables may point to the mutated 
memory and their refinements must be updated as well. However, in the presence 
of conditional, may aliasing, it is impossible to strongly update the refinements on 
all possible aliases; given the static uncertainty about whether a variable points to 
the mutated memory, that variable’s refinement may only be weakly updated. For 
example, suppose we used a simple alias analysis that imprecisely (but soundly) 
concluded all references allocated at the same program point might alias. Variables 
p and q share the allocation site on line 1, so on line 5 we would have to weakly 
update q’s type to {v:int |v = 4Vv = 5}, indicating it may hold either 4 or 
5. Under this same imprecise aliasing assumption, we would also have to weakly 
update p’s type on line 6, preventing the verification of the example program. 

Given the precision loss associated with weak updates, it is critical that 
verification techniques built upon refinement types use precise aliasing information 
and avoid spuriously applied weak updates. Although it is relatively simple to 
conclude that p and q do not alias in Figure 1, consider the example in Figure 2. 
(In this example, x represents non-deterministic values.) Verifying this program 
requires proving a and b never alias at the writes on lines 3 and 4. In fact, a 
and b may point to the same memory location, but only in different invocations 
of loop; this pattern may confound even sophisticated symbolic alias analyses. 
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Additionally, a and b share an allocation site on line 7, so an approach based on 
the simple alias analysis described above will also fail on this example. This must- 
not alias proof obligation can be discharged with existing techniques [53, 54], but 
requires an expensive, on-demand, interprocedural, flow-sensitive alias analysis. 

This paper presents CONSORT (CONtext Sensitive Ownership Refinement 
Types), a type system for the automated verification of program safety in imper- 
ative languages with mutability and aliasing. CONSORT is built upon the novel 
combination of refinement types and fractional ownership types [55, 56]. Frac- 
tional ownership types extend pointer types with a rational number in the range 
(0, 1] called an ownership. These ownerships encapsulate the permission of the 
reference; only references with ownership 1 may be used for mutation. Fractional 
ownership types also obey the following key invariant: any references with a mu- 
table alias must have ownership 0. Thus, any reference with non-zero ownership 
cannot be an alias of a reference with ownership 1. In other words, ownerships 
encode precise aliasing information in the form of must-not aliasing relationships. 

To understand the benefit of this approach, let us return to Figure 1. As mk 
returns a freshly allocated reference with no aliases, its type indicates it returns a 
reference with ownership 1. Thus, our type system can initially give p and q types 
{v:int |v = 3} ref! and {v:int | v = 5} ref! respectively. The ownership 1 on 
the reference type constructor ref indicates both pointers hold “exclusive” own- 
ership of the pointed to reference cell; from the invariant of fractional ownership 
types p and q must not alias. The types of both references can be strongly up- 
dated without requiring spurious weak updates. As a result, at the assertion state- 
ment on line 7, p has type {v:int | v = 4} ref’ expressing the required invariant. 

Our type system can also verify the example in Figure 2 without expensive 
side analyses. As a and b are both mutated, they must both have ownership 1; 
i.e., they cannot alias. This pre-condition is satisfied by all invocations of Loop; 
on line 7, b has ownership 1 (from the argument type), and the newly allocated 
reference must also have ownership 1. Similarly, both arguments on line 9 have 
ownership 1 (from the assumed ownership on the argument types). 

Ownerships behave linearly; they cannot be duplicated, only split when aliases 
are created. This linear behavior preserves the critical ownership invariant. For 
example, if we replace line 9 in Figure 2 with loop(b,b), the program becomes 
ill-typed; there is no way to divide b’s ownership of 1 to into two ownerships of 1. 

Ownerships also obviate updating refinement information of aliases at muta- 
tion. CONSORT ensures that only the trivial refinement T is used in reference 
types with ownership 0, i.e., mutably-aliased references. When memory is mu- 
tated through a reference with ownership 1, CONSORT simply updates the refine- 
ment of the mutated reference variable. From the soundness of ownership types, 
all aliases have ownership 0 and must therefore only contain the T refinement. 
Thus, the types of all aliases already soundly describe all possible contents.’ 

CONSORT is also context-sensitive, and can use different summaries of func- 
tion behavior at different points in the program. For example, consider the variant 


3 This assumption holds only if updates do not change simple types, a condition our 
type-system enforces. 
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of Figure 1 shown in Figure 3. The function get returns 
1 get(p) { *p } the contents of its argument, and is called on lines 5 
and 6. To precisely verify this program, on line 5 get 
must be typed as a function that takes a reference to 
3 and returns 3. Similarly, on line 6 get must be typed 
sp i= get(p) + 1; as a function that takes a reference to 5 and returns 
oq := get(q) + 1; 5. Our type system can give get a function type that 
7 assert (*p = 4); distinguishes between these two calling contexts and 
s assert (*q = 6); selects the appropriate summary of get’s behavior. 
We have formalized CONSORT as a type system 
Fig.3. Example of for a small imperative calculus and proved the system 
context-sensitivity is sound: i.e., a well-typed program never encounters as- 
sertion failures during execution. We have implemented 
a prototype type inference tool targeting this impera- 
tive language and found it can automatically verify several non-trivial programs, 
including sorted lists and an array list data structure. 

The rest of this paper is organized as follows. Section 2 defines the imperative 
language targeted by CONSORT and its semantics. Section 3 defines our type 
system and states our soundness theorem. Section 4 sketches our implementa- 
tion’s inference algorithm and its current limitations. Section 5 describes an eval- 
uation of our prototype, Section 6 outlines related work, and Section 7 concludes. 


3 let p = mkref 3 in 
a let q = mkref 5 in 


2 Target Language 


This section describes a simple imperative language with mutable references and 
first-order, recursive functions. 


2.1 Syntax 
We assume a set of variables, ranged over by x,y, z,..., a set of function names, 
ranged over by f, and a set of labels, ranged over by 41, f2,.... The grammar of 


the language is as follows. 


d::=f œ (a, ..-,2n)e 
e ::= z | let x = yine | let z = nine | ifz z then e; else e2 

| letz = mkref yine | let z = xy in e | let z = f*(y,..., yn) ine 

| z:=y;e |alias(z = y); e | alias(x = *y);e | assert(y);e | e1 ; e2 
P a= (Sdi; ..., dn}, e) 


y stands for a formula in propositional first-order logic over variables, integers 
and contexts; we discuss these formulas later in Section 3.1. 

Variables are introduced by function parameters or let bindings. Like ML, the 
variable bindings introduced by let expressions and parameters are immutable. 
Mutable variable declarations such as int x = 1; in C are achieved in our lan- 
guage with: 

let y = lin(let z = mkref yin...) . 
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As a convenience, we assume all variable names introduced with let bindings and 
function parameters are distinct. 

Unlike ML (and like C or Java) we do not allow general expressions on the 
right hand side of let bindings. The simplest right hand forms are a variable y or 
an integer literal n. mkref y creates a reference cell with value y, and xy accesses 
the contents of reference y. For simplicity, we do not include an explicit null value; 
an extension to support null is discussed in Section 4. Function calls must occur 
on the right hand side of a variable binding and take the form f*(z,,..., £n), 
where 2,...,%p are distinct variables and £ is a (unique) label. These labels are 
used to make our type system context-sensitive as discussed in Section 3.3. 

The single base case for expressions is a single variable. If the variable 
expression is executed in a tail position of a function, then the value of that 
variable is the return value of the function, otherwise the value is ignored. 

The only intraprocedural control-flow operations in our language are if state- 
ments. ifz checks whether the condition variable z equals zero and chooses the 
corresponding branch. Loops can be implemented with recursive functions and 
we do not include them explicitly in our formalism. 

Our grammar requires that side-effecting, result-free statements, assert(y) 
alias(x = y), alias(x = *y) and assignment x := y are followed by a continu- 
ation expression. We impose this requirement for technical reasons to ease our 
formal presentation; this requirement does not reduce expressiveness as dummy 
continuations can be inserted as needed. The assert(y) ;e form executes e if 
the predicate y holds in the current state and aborts the program otherwise. 
alias(z = y); e and alias(z = «y);e assert a must-aliasing relationship between 
x and y (resp. z and xy) and then execute e. alias statements are effectively an- 
notations that our type system exploits to gain added precision. 7: = y ; e updates 
the contents of the memory cell pointed to by x with the value of y. In addition 
to the above continuations, our language supports general sequencing with e3 ; e2. 

A program is a pair (D,e), where D = {dj, ...,d,} is a set of first-order, 
mutually recursive function definitions, and e is the program entry point. A 
function definition d maps the function name to a tuple of argument names 
Ti, ---, Zn that are bound within the function body e. 


Paper Syntax. In the remainder of the paper, we will write programs that are 
technically illegal according to our grammar, but can be easily “de-sugared” into 
an equivalent, valid program. For example, we will write 


let x = mkref 4 in assert (*x = 4) 
as syntactic sugar for: 


let f = 4 in let x = mkref f in 
let tmp = *x in assert(tmp = 4); let dummy = O in dummy 


2.2 Operational Semantics 


We now introduce the operational semantics for our language. We assume a 
finite domain of heap addresses Addr: we denote an arbitrary address with a. 
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(H, R, F : Fz) = (H, R, F, Fle]) (H, R, F : F, Elz; el} = (H, R, Ë, E[e]) 


(R-VAR) (R-SEQ) 
x’ g dom(R) x’ Z dom(R) 
(H, R, Ë, Eflet z = yin e]) (H, R, Ë, Ellet s = nin e]} 
35 (H, R{2' = R(y)}, F, Elfe’ /x]e]} ->y (H, R{a’ => n}, F, El[2’/z]e}) 
(R-LET) (R-LETINT) 
R(x) = 0 R(x) # 0 
(H, R, Ë, E|ifz x then e; else e2]} (H, R, Ë, Elifz x then e; else e2]} 
jp (H, R, Ë, E[ei]) = (H, R, Ë, Eles]) 
(R-IFTRUE) (R-IFFALSE) 
a ¢ dom(H) a’ g dom(R) R(y) =a H(a) =v x’ g dom(R) 
(H, R,F, Eflet z = mkref yin el) — pd (H, R, Ë, Eflet x = *y in e]) —>p 
(H{a + R(y)}, R{2’ => a}, F, El[2' /z]e]) (H, R{a! + v}, F, Ella’ /z]e]} 
(R-MKREF) (R-DEREF) 


Fig. 4. Transition Rules (1). 


A runtime state is represented by a configuration (H sii; F ; e), which consists 
of a heap, register file, stack, and currently reducing expression respectively. 
The register file maps variables to runtime values v, which are either integers n 
or addresses a. The heap maps a finite subset of addresses to runtime values. 
The runtime stack represents pending function calls as a sequence of return 
contexts, which we describe below. While the final configuration component is an 
expression, the rewriting rules are defined in terms of E[e], which is an evaluation 
context E and redex e, as is standard. The grammar for evaluation contexts is 
defined by: E ::= E’;e | |]. 

Our operational semantics is given in Figures 4 and 5. We write dom( H) to 
indicate the domain of a function and H{a > v} where a ¢ dom(H) to denote a 
map which takes all values in dom(#) to their values in H and which additionally 
takes a to v. We will write H{a +> v} where a € dom(H) to denote a map 
equivalent to H except that a takes value v. We use similar notation for dom(R) 
and R{x +> v}. We also write Ø for the empty register file and heap. The step 
relation —> p is parameterized by a set of function definitions D; a program (D, e) 
is executed by stepping the initial configuration (0,,-,e) according to —> p. 
The semantics is mostly standard; we highlight some important points below. 

Return contexts F take the form E[let y = []‘ in e]. A return context repre- 
sents a pending function call with label Z, and indicates that y should be bound to 
the return value of the callee during the execution of e within the larger execution 
context E. The call stack F is a sequence of these contexts, with the first such re- 
turn context representing the most recent function call. The stack grows at func- 
tion calls as described by rule R-CALL. For a call E[let z = f*(y1,..-, yn) ine] 
where f is defined as (2, ...,%,)e’, the return context E[let y = []‘ine] is 
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fr (a1, ..,%)e E D R(z) =a a € dom(H) 
(H, R, Ë, Bflet s = ff (y1, .--, Yn) in e']} (H, R, Ë, Ele:=y;e]) —>p 
=p (H, R, Eflet z = |] in e'] : F, [y1 /xi]--- [yn/@n]e) (H{a +} R(y)}, R, F, E[e]) 
(R-CALL) (R-ASSIGN) 
R(x) = R(y) 


Ry) =a H(a) = R(2) 
(H, R, Ë, Efalias(x = *y) ; el) —pD (H, R,F, Efe] 
(R-ALIASPTR) 


(H, R, F, Efalias(x = y) ; el) 
-5p (H,R, F, Efe] 


(R-ALIAS) 
R(x) # Ry) R(x) # H(R(y)) 
(H, R, F, Efalias(« = y) ; el) —+p AliasFail (H, R, F, Elalias(« = *y) ; el) —+p AliasFail 
(R-ALIASFAIL) (R-ALIASPTRFAIL) 
= [R] p 


E [R] p 
(H, R, Ë, Elassert(y) ; el) —>p AssertFail 
(R-ASSERTFAIL) 


(H, R, F, Elassert(¢) ; el) 
—+ (H, R, F, Ble]) 
(R-ASSERT) 


Fig. 5. Transition Rules (2). 


prepended onto the stack of the input configuration. The substitution of formal 
arguments for parameters in e’, denoted by [y1/21]---[Yn/%n]e’, becomes the 
currently reducing expression in the output configuration. Function returns are 
handled by R-VAR. Our semantics return values by name; when the currently 
executing function fully reduces to a single variable x, x is substituted into the 
return context on the top of the stack, denoted by E[let y = []‘ in e][z]. 


In the rules R-ASSERT we write = [R] y to mean that the formula yielded 
by substituting the concrete values in R for the variables in ọ is valid within 
some chosen logic (see Section 3.1); in R-ASSERTFAIL we write 4 [R] p when 
the formula is not valid. The substitution operation [R] y is defined inductively 
as [0] y = y, [R{r 6 n}] y = [R] [n/z]y, [R{2 > a}] y = [R] y. In the case of an 
assertion failure, the semantics steps to a distinguished configuration AssertFail. 
The goal of our type system is to show that no execution of a well-typed program 
may reach this configuration. The alias form checks whether the two references 
actually alias; i.e., if the must-alias assertion provided by the programmer is 
correct. If not, our semantics steps to the distinguished AliasFail configuration. 
Our type system does not guarantee that AliasFail is unreachable; aliasing 
assertions are effectively trusted annotations that are assumed to hold. 


In order to avoid duplicate variable names in our register file due to recursive 
functions, we refresh the bound variable x in a let expression to x’. Take expression 
let x = yin e as an example; we substitute a fresh variable x’ for x in e, then bind 
x’ to the value of variable y. We assume this refreshing of variables preserves our 
assumption that all variable bindings introduced with let and function parameters 
are unique, i.e. x’ does not overlap with variable names that occur in the program. 
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Types T ::= {v :int | p} | rref” Function Types Ø n= VÀ. (a Tinas n T 
Ownership r € [0,1] > (2 Emi resp in: Ta |T) 
Refinements Y ::= Y1 V f2 | ay | T Context Variables à ‘= CVar 
| p, k Ur) Concrete Context l::=L:£L | € 
| D = Vo Pred. Context C ::=£:C | À | € 
| CP Context Query CP ::= fx C 
Ref. Values U ::= & | n | V Typing Context L::= À | Ë 


Fig. 6. Syntax of types, refinements, and contexts. 


3 Typing 


We now introduce a fractional ownership refinement type system that guarantees 
well-typed programs do not encounter assertion failures. 


3.1 Types and Contexts 


The syntax of types is given in Figure 6. Our type system has two type con- 
structors: references and integers. T ref” is the type of a (non-null) reference to a 
value of type T. r is an ownership which is a rational number in the range [0, 1]. 
An ownership of 0 indicates a reference that cannot be written, and for which 
there may exist a mutable alias. By contrast, 1 indicates a pointer with exclusive 
ownership that can be read and written. Reference types with ownership values 
between these two extremes indicate a pointer that is readable but not writable, 
and for which no mutable aliases exist. CONSORT ensures that these invariants 
hold while aliases are created and destroyed during execution. 

Integers are refined with a predicate y. The language of predicates is built using 
the standard logical connectives of first-order logic, with (in)equality between 
variables and integers, and atomic predicate symbols ģ as the basic atoms. We 
include a special “value” variable v representing the value being refined by the 
predicate. For simplicity, we omit the connectives y1 A y2 and y1 =>> y2; they 
can be written as derived forms using the given connectives. We do not fix a 
particular theory from which ¢ are drawn, provided a sound (but not necessarily 
complete) decision procedure exists. CP are context predicates, which are used 
for context sensitivity as explained below. 


Example 1. {v:int |v > 0} is the type of strictly positive integers. The type 
of immutable references to integers exactly equal to 3 can be expressed by 
{v:int |v = 3}ref?*. 


As is standard, we denote a type environment with I’, which is a finite map 
from variable names to type T. We write I'[x : T] to denote a type environment 
I such that I(x) = 7 where z € dom(I’), [',x : T to indicate the extension of 
I with the type binding «:7, and I'[z < 7] to indicate the type environment 
I’ with the binding of x updated to r. We write the empty environment as 
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e. The treatment of type environments as mappings instead of sequences in a 
dependent type system is somewhat non-standard. The standard formulation 
based on ordered sequences of bindings and its corresponding well-formedness 
condition did not easily admit variables with mutually dependent refinements 
as introduced by our function types (see below). We therefore use an unordered 
environment and relax well-formedness to ignore variable binding order. 


Function Types, Contexts, and Context Polymorphism. Our type system achieves 
context sensitivity by allowing function types to depend on where a function is 
called, i.e., the execution context of the function invocation. Our system represents 
a concrete execution contexts with strings of call site labels (or just “call strings”), 
defined by £::= e | £ : Č As is standard (e.g., [49, 50]), the string £ : l abstracts an 
execution context where the most recent, active function call occurred at call site 
£ which itself was executed in a context abstracted by l: € is the context under 
which program execution begins. Context variables, drawn from a finite domain 
CVar and ranged over by A1, À2,..., represent arbitrary, unknown contexts. 

A function type takes the form VA. (21 :71,---;@niTn) > (T1: Ti,- -3 En Th |T). 
The arguments of a function are an n-ary tuple of types 7;. To model side-effects on 
arguments, the function type includes the same number of output types T}. In ad- 
dition, function types have a direct return type 7. The argument and output types 
are given names: refinements within the function type may refer to these names. 
Function types in our language are context polymorphic, expressed by universal 
quantification “VA.” over a context variable. Intuitively, this context variable repre- 
sents the many different execution contexts under which a function may be called. 

Argument and return types may depend on this context variable by including 
context query predicates in their refinements. A context query predicate CP 
usually takes the form A < à, and is true iff lisa prefix of the concrete context 
represented by À. Intuitively, a refinement Ë < = ọ states that vy holds in any 
concrete execution context with prefix 0, and provides no information in any other 
context. In full generality, a context query predicate may be of the form A <b, 
or l< li... Ln : A; these forms may be immediately simplified to T, L or ü AÀ 


Example 2. The type {v : int | (4&4 3 à => v = 3) A^ (42 XX => v = 5)} rep- 
resents an integer that is 3 if the most recent active function call site is 4, 5 if 
the most recent call site is 42, and is otherwise unconstrained. This type may be 
used for the argument of f in, e.g., f^ (3) + f“ (5). 


As types in our type system may contain context variables, our typing 
judgment (introduced below) includes a typing context £, which is either a 
single context variable À or a concrete context £. This typing context represents 
the assumptions about the execution context of the term being typed. If the 
typing context is a context variable A, then no assumptions are made about the 
execution context of the term, although types may depend upon A with context 
query predicates. Accordingly, function bodies are typed under the context 
variable universally quantified over in the corresponding function type; i.e., no 
assumptions are made about the exact execution context of the function body. 
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As in parametric polymorphism, consistent substitution of a concrete context 
£ for a context variable A in a typing derivation yields a valid type derivation 
under concrete context £. 


Remark 1. The context-sensitivity scheme described here corresponds to the 
standard CFA approach [50] without a priori call-string limiting. We chose this 
scheme because it can be easily encoded with equality over integer variables (see 
Section 4), but in principle another context-sensitivity strategy could be used 
instead. The important feature of our type system is the inclusion of predicates 
over contexts, not the specific choice for these predicates. 


Function type environments are denoted with O and are finite maps from 
function names (f) to function types (ø). 


Well Formedness. We impose two well-formedness conditions on types: ownership 
well-formedness and refinement well-formedness. The ownership condition is 
purely syntactic: T is ownership well-formed if r = 7’ ref° implies T’ = T,, for 
some n. T; is the “maximal” type of a chain of 7 references, and is defined 
inductively as To = {v : int | T}, T; = T;—1 ref’. 

The ownership well-formedness condition ensures that aliases introduced via 
heap writes do not violate the invariant of ownership types and that refinements 
are consistent with updates performed through mutable aliases. Recall our own- 
ership type invariant ensures all aliases of a mutable reference have 0 ownership. 
Any mutations through that mutable alias will therefore be consistent with the 
“no information” T refinement required by this well-formedness condition. 

Refinement well-formedness, denoted £ | F wr y, ensures that free program 
variables in refinement y are bound in a type environment I" and have integer type. 
It also requires that for a typing context £ = A, only context query predicates 
over À are used (no such predicates may be used if £ = £). Notice this condition 
forbids refinements that refer to references. Although ownership information can 
signal when refinements on a mutably-aliased reference must be discarded, our 
current formulation provides no such information for refinements that mention 
mutably-aliased references. We therefore conservatively reject such refinements 
at the cost of some expressiveness in our type system. 

We write £ | I wr T to indicate a well-formed type where all refinements are 
well-formed with respect to £ and I’. We write L F wr T for a type environment 
where all types are well-formed. A function environment is well-formed (written 
Fwr O) if, for every o in O, the argument, result, and output types are well- 
formed with respect to each other and the context variable quantified over in o. 
As the formal definition of refinement well-formedness is fairly standard, we omit 
it for space reasons (the full definition may be found in the full version [60]). 


3.2 Intraprocedural Type System 


We now introduce the type system for the intraprocedural fragment of our 
language. Accordingly, this section focuses on the interplay of mutability and 


694 J. Toman et al. 


T-V: 
O | L| T[z : T1 + T2] F z : Tı ST [t +> 72] ( AR) 
O | £| Tye Ti ^ Y =r T]; : (T2 ^s £ =r Y)Fe:r>s I” xz g dom(I’) Ti 
O | £L | T[y : mı + T2] F let z = yine : T= I” lee) 
O|L|T,z:{v:int|v = ntre:rsIr" x g dom(I’’) 
T-LetI 
O|L|Prletx—=nine:t>T" (betini) 
O | £ |T[z +} {v:int|p^Av = 0H}F e:r” 
O | L |T[r <+> {v:int | pAv £ 0} F e:r” (T-I) 
-IF 
O | £| T[z:{v:int | y}] F ifz z then e; else eg: T > I” 
O | L| Ty e T1], £: (T2 Act =r y) reft H e:t” O|L|r rF a:r s 
x g dom(I’) OJL|I' F e:r" Sr” 
O | £ | Ty : Tı +72] F let x = mkref yin e : T> I” @|L| TK e&;e:tT” >” 
(T-MKREF) (T-SEQ) 
, Tı Niy Y Sn T r>O0 
T = 
Ti r=0 rEg elr Fwr g 
H 
O | £| T[y +} r' ref], z :T2F e:t I" OlL£|Pre:rsr 
x g dom(I’) O| L]|T EF assert(p); e: t=" 
O | £ | T[y : (T1 + T2) ref"] + let z = *yin e : T> I” (T-ASSERT) 


(T-DEREF) 


Fig. 7. Expression typing rules. 


refinement types. The typing rules are given in Figures 7 and 8. A typing judgment 
takes the form O | £ | I H e : T= I”, which indicates that e is well-typed under 
a function type environment O, typing context £, and type environment I, and 
evaluates to a value of type T and modifies the input environment according to I”. 
Any valid typing derivation must have £L F wr I’, CK we I’, and £ | I’ twe T, 
i.e., the input and output type environments and result type must be well-formed. 

The typing rules in Figure 7 handle the relatively standard features in our 
language. The rule T-SEQ for sequential composition is fairly straightforward 
except that the output type environment for e is the input type environment for 
e2. T-LETINT is also straightforward; since x is bound to a constant, it is given 
type {v: int | v = n} to indicate z is exactly n. The output type environment I” 
cannot mention x (expressed with x ¢ dom(I”)) to prevent x from escaping its 
scope. This requirement can be met by applying the subtyping rule (see below) to 
weaken refinements to no longer mention x. As in other refinement type systems 
[47], this requirement is critical for ensuring soundness. 


Rule T-LET is crucial to understanding our ownership type system. The 
body of the let expression e is typechecked under a type environment where 
the type of y in I is linearly split into two types: 7, for y and Tə for the newly 
created binding x. This splitting is expressed using the + operator. If y is a ref- 
erence type, the split operation distributes some portion of y’s ownership infor- 
mation to its new alias z. The split operation also distributes refinement infor- 
mation between the two types. For example, type {v :int | v > 0} ref! can be 
split into (1) {v :int |v > O} ref" and {v:int |v > O}ref“-" (for r € (0,1)), 
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i.e., two immutable references with non-trivial refinement information, or (2) 
{v:int |v > 0} ref! and {v: int | T}ref°, where one of the aliases is mutable 
and the other provides no refinement information. How a type is split depends 
on the usage of x and y in e. Formally, we define the type addition operator as 
the least commutative partial operation that satisfies the following rules: 


{v:int | gi} + {v:int | yo} = {v : int | y1 A p2} (TappD-INT) 
7 ref" +7, ref” = (7, + T2) ref"? (TADD-REF) 


Viewed another way, type addition describes how to combine two types for the 
same value such that the combination soundly incorporates all information from 
the two original types. Critically, the type addition operation cannot create or 
destroy ownership and refinement information, only combine or divide it between 
types. Although not explicit in the rules, by ownership well-formedness, if the 
entirety of a reference’s ownership is transferred to another type during a split, 
all refinements in the remaining type must be T. 

The additional bits Ayy =,, £z and Azz =+, y express equality between x and 
y as refinements. We use the strengthening operation T A, Y and typed equality 
proposition z =, y, defined respectively as: 


{v:int | p} Ay p ={vsint | yAl[v/yly} (£ =~sint|yy Y) = (£ = y) 
Tref” Ayo =rref" (£ =rrefr y) =T 


We do not track equality between references or between the contents of aliased 
reference cells as doing so would violate our refinement well-formedness condition. 
These operations are also used in other rules that can introduce equality. 

Rule T-MKREF is very similar to T-LET, except that x is given a reference 
type of ownership 1 pointing to T2, which is obtained by splitting the type of y. In 
T-DEREF, the content type of y is split and distributed to x. The strengthening 
is conditionally applied depending on the ownership of the dereferenced pointer, 
that is, if r = 0, 7’ has to be a maximal type T;. 

Our type system also tracks path information; in the T-IF rule, we update the 
refinement on the condition variable within the respective branches to indicate 
whether the variable must be zero. By requiring both branches to produce the 
same output type environment, we guarantee that these conflicting refinements 
are rectified within the type derivations of the two branches. 

The type rule for assert statements has the precondition I = y which is 
defined to be = [T] = > 9, i.e., the logical formula |T] ==> ¢ is valid in the 
chosen theory. [J] lifts the refinements on the integer valued variables into a 
proposition in the logic used for verification. This denotation operation is defined 


as: [e] = T Hv:int | pyl, = [y/v] K 
M,e: r] =] AFI, [7’ ref’, =T 


If the formula |T] = ¢ is valid, then in any context and under any valuation 
of program variables that satisfy the refinements in [I], the predicate p must be 
true and the assertion must not fail. This intuition forms the foundation of our 
soundness claim (Section 3.4). 
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(The shapes of T’ and T2 are similar) 
O| LIT [ze m1] [y |} (T2 ^y Y =r £) ref] F e:t" 


T-ASSIGN 
O |L|T[z: Tı +r2]ly : 7 reft] F y:=z;e: T> I" ( ) 
(Tı ref"! +72 ref") ~ (Ti ref"! +r ref"2) 
+ 
ez Dia +} r! ref"i][y © rh ref"2] F e: t> T (T-Auias) 
O | £| le: Tı ref"! |[y : T2 ref") + alias(x = y); e : T >I" i 
(Tı ref"! +72 ref") & (Ti ref"! +r, ref"2) 


O | £ | T[z er ref ][y + (t, ref"2)ref”] BHeirsI" 
O | £L | T[z : tı ref" |[y : (T2 ref"2) ref"] + alias(x = +y); e : T> I” 


(T-ALIASPTR) 


r< OLP hens’ Tarar", T 
7 777 (T-Sus) 
O|L|Leer SP 
Ti X T2 iff @ F Tı < T2 and è F T2 < T1. 
Fig. 8. Pointer manipulation and subtyping 
Pen = Va € dom(IT').T E r(x) <I’ 
- F #1 v2. (S-INT) 2 ome > (2) < T (2) (S-TYEnv) 
TF {v:int | p1} < {v:int | go} FLT 
Pang PP ry. S73 Drtí eer d J 
T1272 - 1 = (S-Rur) A ai eas! Me Ec zE om(T) (S-Res) 
I F m1 ref"! < tg ref" Br = Dy 


Fig. 9. Subtyping rules. 


Destructive Updates, Aliasing, and Subtyping. We now discuss the handling 
of assignment, aliasing annotations, and subtyping as described in Figure 8. 
Although apparently unrelated, all three concern updating the refinements of 
(potentially) aliased reference cells. 

Like the binding forms discussed above, T-ASSIGN splits the assigned value’s 
type into two types via the type addition operator, and distributes these types 
between the right hand side of the assignment and the mutated reference contents. 
Refinement information in the fresh contents may be inconsistent with any 
previous refinement information; only the shapes must be the same. In a system 
with unrestricted aliasing, this typing rule would be unsound as it would admit 
writes that are inconsistent with refinements on aliases of the left hand side. 
However, the assignment rule requires that the updated reference has an ownership 
of 1. By the ownership type invariant, all aliases with the updated reference have 0 
ownership, and by ownership well-formedness may only contain the T refinement. 


Example 3. We can type the program as follows: 


let x = mkref 5 in // «:{v:int |v = 5} ref? 
let y = x in // z:Tı, y: {v:int |v = 5}ref' 
y := 4; assert(*y = 4) // x:Tı,y:{v:int|v = 4} reft 
In this and later examples, we include type annotations within comments. We 


stress that these annotations are for expository purposes only; our tool can infer 
these types automatically with no manual annotations. 
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As described thus far, the type system is quite strict: if ownership has been 
completely transferred from one reference to another, the refinement information 
found in the original reference is effectively useless. Additionally, once a mutable 
pointer has been split through an assignment or let expression, there is no 
way to recover mutability. The typing rule for must alias assertions, T-ALIAS 
and T-ALIASPTR, overcomes this restriction by exploiting the must-aliasing 
information to “shuffle” or redistribute ownerships and refinements between two 
aliased pointers. The typing rule assigns two fresh types 7; ref ri and T5 ref”? to 
the two operand pointers. The choice of Ti, r1, T, and r is left open provided 
that the sum of the new types, (Ti ref") + (7 ref”?) is equivalent (denoted ~) 
to the sum of the original types. Formally, ~ is defined as in Figure 8; it implies 
that any refinements in the two types must be logically equivalent and that 
ownerships must also be equal. This redistribution is sound precisely because the 
two references are assumed to alias; the total ownership for the single memory 
cell pointed to by both references cannot be increased by this shuffling. Further, 
any refinements that hold for the contents of one reference must necessarily hold 
for contents of the other and vice versa. 


Example 4 (Shuffling ownerships and refinements). Let pan be v = n. 


let x = mkref 5 in // «:{v:int | y_s} ref! 
let y = x in // 2:711,y:{v:int | ys} ref? 


y := 4; alias(x = y) // 2: {v:int | goa} ref°°,y : {v:int | pa} ref” 


The final type assignment for x and y is justified by 


Tı + {v:int | yea} ref’ = {v:int | T A yea} ref! = 
{v:int | yea A ya} ref! = {v:int | pea} ref” + {v: int | y_4} ref°°. 


The aliasing rules give fine-grained control over ownership information. This 
flexibility allows mutation through two or more aliased references within the 
same scope. Provided sufficient aliasing annotations, the type system may shuffle 
ownerships between one or more live references, enabling and disabling mutability 
as required. Although the reliance on these annotations appears to decrease the 
practicality of our type system, we expect these aliasing annotations can be 
inserted by a conservative must-aliasing analysis. Further, empirical experience 
from our prior work [56] indicates that only a small number of annotations are 
required for larger programs. 


Example 5 (Shuffling Mutability). Let p-n again be v = n. The following 
program uses two live, aliased references to mutate the same memory location: 


let x = mkref 0 in 

x in // «:{v:int | peo} reft, y : Tı 

y); // ws Ta, y: {v:int | oy bref" 

y); // 2: {v:int | poo} ref°°, y : {v:int | poo} ref? 


let y 


x := 1; alias(x 


y := 2; alias(x 
assert (*x = 2) 
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O(f) =VA. (a1: T1,- In: Tn) > (T1: Ti,- -3 En: Tn | T) 
oa =[E:L/A) oa = [y1 /21]--- [yn/2n] 
O| L] sate Ti, 0: Caer TE err SI" x g dom(I”) 


7 - 7 > (T-CALL) 
O|LI Fri atem] F letx =F rmn mer S 
O(f) = VA. (11: Ti,- -3 2n: Tn) > (a1: Ti,- Eni Tn | T) 
O|A|aiiti,---;2niTn FETS MT, Eni Th (T-FUNDEF) 
OF fH (t, ..,a)e 
Vf > (a1, .,am)e€ D.O F FH (a1, ..,In)e OF D Fwr O 
dom(D) = dom(@) Olelebe:rslr 
OFD F (D,e) 
(T-Funs) (T-PRoG) 


Fig. 10. Program typing rules 


After the first aliasing statement the type system shuffles the (exclusive) mutability 
between xz and y to enable the write to y. After the second aliasing statement 
the ownership in y is split with z; note that transferring all ownership from y to 
x would also yield a valid typing. 


Finally, we describe the subtyping rule. The rules for subtyping types and 
environments are shown in Figure 9. For integer types, the rules require the 
refinement of a supertype is a logical consequence of the subtype’s refinement 
conjoined with the lifting of I’. The subtype rule for references is covariant in 
the type of reference contents. It is widely known that in a language with un- 
restricted aliasing and mutable references such a rule is unsound: after a write 
into the coerced pointer, reads from an alias may yield a value disallowed by 
the alias’ type [43]. However, as in the assign case, ownership types prevent un- 
soundness; a write to the coerced pointer requires the pointer to have ownership 
1, which guarantees any aliased pointers have the maximal type and provide no 
information about their contents beyond simple types. 


3.3 Interprocedural Fragment and Context-Sensitivity 


We now turn to a discussion of the interprocedural fragment of our language, 
and how our type system propagates context information. The remaining typing 
rules for our language are shown in Figure 10. These rules concern the typing of 
function calls, function bodies, and entire programs. 

We first explain the T-CALL rule. The rule uses two substitution maps. Oy 
translates between the parameter names used in the function type and actual 
argument names at the call-site. gq instantiates all occurrences of A in the callee 
type with Z: £L, where £ is the label of the call-site and £ the typing context of 
the call. The types of the arguments y;’s are required to match the parameter 
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types (post substitution). The body of the let binding is then checked with 
the argument types updated to reflect the changes in the function call (again, 
post substitution). This update is well-defined because we require all function 
arguments be distinct as described in Section 2.1. Intuitively, the substitution Ca 
represents incrementally refining the behavior of the callee function with partial 
context information. If £ is itself a context variable A’, this substitution effectively 
transforms any context prefix queries over À in the argument /return/output 
types into a queries over £ : A’. In other words, while the exact concrete execution 
context of the callee is unknown, the context must at least begin with £ which 
can potentially rule out certain behaviors. 

Rule T-FUNDEF type checks a function definition f +> (a1, ..,%,)e against 
the function type given in O. As a convenience we assume that the parameter 
names in the function type match the formal parameters in the function definition. 
The rule checks that under an initial environment given by the argument types the 
function body produces a value of the return type and transforms the arguments 
according to the output types. As mentioned above, functions may be executed 
under many different contexts, so type checking the function body is performed 
under the context variable A that occurs in the function type. 

Finally, the rule for typing programs (T-PROG) checks that all function 
definitions are well typed under a well-formed function type environment, and 
that the entry point e is well typed in an empty type environment and the typing 
context e€, i.e., the initial context. 


Example 6 (1-CFA). Recall the program in Figure 3 in Section 1; assume the 
function calls are labeled as follows: 


pit get"! (p) + 1; 
Yl duties 
q := get? (q) + 1; 


Taking Tp to be the type shown in Example 2: 
{v:int | (4 <A => v = 3A (k XA = v = 5)} 
we can give get the type VA. (z Tp ref’) > (z i Tp ref! | Ta): 


Example 7 (2-CFA). To see how context information propagates across multiple 
calls, consider the following change to the code considered in Example 6: 


get_real(z) { *z } 
get(z) { get_real® (z) } 


The type of get remains as in Example 6, and taking 7 to be 
{v:int | (3h sX => v = 3) A (bzh 3X = v = 5)} 


the type of get_real is: VV. (z : Tref!) >(z: rref' | T). 
We focus on the typing of the call to get_real in get; it is typed in context 
A and a type environment where p is given type Tp from Example 6. 
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Applying the substitution [l3 : \/X’] to the argument type of get_real yields: 


{v:int | (b3 l1 2 fg: A => v = 3)A (bh <f3:\ = v = 5)} ref! & 
{y:int | (4 iA => v = 3)A(22< = v = 5)} ref" 


which is exactly the type of p. A similar derivation applies to the return type of 
get_real and thus get. 


3.4 Soundness 


We have proven that any program that type checks according to the rules above 
will never experience an assertion failure. We formalize this claim with the 
following soundness theorem. 


Theorem 1 (Soundness). [ft (D,e), then (0,0,-,e) A> AssertFail. 

Further, any well-typed program either diverges, halts in the configuration 
AliasFail, or halts in a configuration (H,R,-,x) for some H,R and z, i.e., 
evaluation does not get stuck. 


Proof (Sketch). By standard progress and preservation lemmas; the full proof 
has been omitted for space reasons and can be found in the full version [60]. 


4 Inference and Extensions 


We now briefly describe the inference algorithm implemented in our tool CoN- 
SORT. We sketch some implemented extensions needed to type more interesting 
programs and close with a discussion of current limitations of our prototype. 


4.1 Inference 


Our tool first runs a standard, simple type inference algorithm to generate type 
templates for every function parameter type, return type, and for every live 
variable at each program point. For a variable x of simple type Tg ::= int | Tg ref 
at program point p, CONSORT generates a type template [Ts], o p 2s follows: 

[int], np = {viint | Ys,np(v;FVp)} [rs ref]; np = Folino ref™ m? 


2,n,p(V; FV p) denotes a fresh relation symbol applied to v and the free variables 
of simple type int at program point p (denoted FV,). rz.n,p is a fresh ownership 
variable. For each function f, there are two synthetic program points, f? and f° 
for the beginning and end of the function respectively. At both points, CONSORT 
generates type template for each argument, where FV pẹ and FV ye are the names 
of integer typed parameters. At f°, CONSORT also generates a type template 
for the return value. We write I? to indicate the type environment at point p, 
where every variable is mapped to its corresponding type template. [IP] is thus 
equivalent to Aserv, Yx.0,p(0; FVp). 
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When generating these type templates, our implementation also generates own- 
ership well-formedness constraints. Specifically, for a type template of the form 
{v :int | Pr n+1,p(V; FV>)}ref""” CONSORT emits the constraint: Te n,p = 
0 = ¢2,n41,p(¥;FV>) and for a type template (r ref’™"+"”) ref’? CON- 
SORT emits the constraint rz np =0 => Tr n+,p = 9. 

CoNSORT then walks the program, generating constraints between relation 
symbols and ownership variables according to the typing rules. These constraints 
take three forms, ownership constraints, subtyping constraints, and assertion 
constraints. Ownership constraints are simple linear (in)equalities over ownership 
variables and constants, according to conditions imposed by the typing rules. 
For example, if variable x has the type template 7 ref’™°” for the expression 
r:=y;e at point p, CONSORT generates the constraint rg 0p = 1. 

CoNSORT emits subtyping constraints between the relation symbols at 
related program points according to the rules of the type system. For example, for 
the term let z = yin e at program point p (where e is at program point p’, and x 
has simple type int ref) CONSORT generates the following subtyping constraint: 


[LP] A Pyp EVp) => Yy1p (V EFV p) A prp (EFV p) 


in addition to the ownership constraint ry 0,p = Ty,0,p' + Tx,0,p'- 

Finally, for each assert (y) in the program, CONSORT emits an assertion 
constraint of the form: [IP] => y which requires the refinements on integer 
typed variables in scope are sufficient to prove y. 


Encoding Context Sensitivity. To make inference tractable, we require the user 
to fix a priori the maximum length of prefix queries to a constant k (this choice 
is easily controlled with a command line parameter to our tool). We supplement 
the arguments in every predicate application with a set of integer context vari- 


ables c1,..., Cx; these variables do not overlap with any program variables. 
CONSORT uses these variables to infer context sensitive refinements as 
follows. Consider a function call let £ = f (y1,..., yn) ine at point p where e 


is at point p’. CONSORT generates the following constraint for a refinement 
Py. np(Y; C1,- --, Ck; FV) which occurs in the type template of y;: 


Pyin,plV, co,- --, RIF Vp) => Oz Gain pe (V, L, Co, +++; Ch—1; FV fo) 
Ox Pa, .n,fe(Y, L, Co,- -., Ck—1; EV fe) => Pyin,p' (V, Co,- +5 Chi FV p) 
Or = [y1/2] pe [Yn/ fn] 


Effectively, we have encoded ,...€; < À as ^o<i<kCi = li. In the above, the 
shift from co,...,c, to £L, Co,- ..,Ck—1 plays the role of og in the T-CALL rule. 
The above constraint serves to determine the value of co within the body of the 
function f. If f calls another function g, the above rule propagates this value of 
co to cı within g and so on. The solver may then instantiate relation symbols 
with predicates that are conditional over the values of c;. 


Solving Constraints. The results of the above process are two systems of con- 
straints; real arithmetic constraints over ownership variables and constrained Horn 


702 J. Toman et al. 


clauses (CHC) over the refinement relations. Under certain assumptions about the 
simple types in a program, the size of the ownership and subtyping constraints will 
be polynomial to the size of the program. These systems are not independent; the 
relation constraints may mention the value of ownership variables due to the well- 
formedness constraints described above. The ownership constraints are first solved 
with Z3 [16]. These constraints are non-linear but Z3 appears particularly well- 
engineered to quickly find solutions for the instances generated by CONSORT. We 
constrain Z3 to maximize the number of non-zero ownership variables to ensure as 
few refinements as possible are constrained to be T by ownership well-formedness. 
The values of ownership variables inferred by Z3 are then substituted into the 
constrained Horn clauses, and the resulting system is checked for satisfiability 
with an off-the-shelf CHC solver. Our implementation generates constraints in 
the industry standard SMT-Lib2 format [8]; any solver that accepts this format 
can be used as a backend for CONSORT. Our implementation currently supports 
Spacer [37] (part of the Z3 solver [16]), HoICE [13], and Eldarica [48] (adding a 
new backend requires only a handful of lines of glue code). We found that different 
solvers are better tuned to different problems; we also implemented parallel mode 
which runs all supported solvers in parallel, using the first available result. 


4.2 Extensions 


Primitive Operations. As defined in Section 2, our language can compare integers 
to zero and load and store them from memory, but can perform no meaningful 
computation over these numbers. To promote the flexibility of our type system 
and simplify our soundness statement, we do not fix a set of primitive operations 
and their static semantics. Instead, we assume any set of primitive operations 
used in a program are given sound function types in O. For example, under the 
assumption that + has its usual semantics and the underlying logic supports +, we 
can give + the type VA. (©: To, y : To) > (z : To, y : To | {v:int |v = «+ y}). 
Interactions with a nondeterministic environment or unknown program inputs 
can then be modeled with a primitive that returns integers refined with T. 


Dependent Tuples. Our implementation supports types of the form: (21 :71,..., 
In : Tn), Where x; can appear within 7, (j # i) if 7; is an integer type. For 
example, («:{v:int | T},y:{v:int |v > c}) is the type of tuples whose second 
element is strictly greater than the first. We also extend the language with tuple 
constructors as a new value form, and let bindings with tuple patterns as the LHS. 

The extension to type checking is relatively straightforward; the only signifi- 
cant extensions are to the subtyping rules. Specifically, the subtyping check for a 
tuple element x; :7; is performed in a type environment elaborated with the types 
and names of other tuple elements. The extension to type inference is also straight- 
forward; the arguments for a predicate symbol include any enclosing dependent 
tuple names and the environment in subtyping constraints is likewise extended. 


Recursive Types. Our language also supports some unbounded heap structures 
via recursive reference types. To keep inference tractable, we forbid nested recur- 
sive types, multiple occurrences of the recursive type variable, and additionally 
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fix the shape of refinements that occur within a recursive type. For recursive re- 
finements that fit the above restriction, our approach for refinements is broadly 
similar to that in [35], and we use the ownership scheme of [56] for handling 
ownership. We first use simple type inference to infer the shape of the recursive 
types, and automatically insert fold/unfold annotations into the source program. 
As in [35], the refinements within an unfolding of a recursive type may refer to 
dependent tuple names bound by the enclosing type. These recursive types can 
express, e.g., the invariants of a mutable, sorted list. As in [56], recursive types 
are unfolded once before assigning ownership variables; further unfoldings copy 
existing ownership variables. 

As in Java or C++, our language does not support sum types, and any 
instantiation of a recursive type must use a null pointer. Our implementation 
supports an ifnull construct in addition to a distinguished null constant. Our 
implementation allows any refinement to hold for the null constant, including 
L. Currently, our implementation does not detect null pointer dereferences, and 
all soundness guarantees are made modulo freedom of null dereferences. As |7] 
omits refinements under reference types, null pointer refinements do not affect 
the verification of programs without null pointer dereferences. 


Arrays. Our implementation supports arrays of integers. Each array is given an 
ownership describing the ownership of memory allocated for the entire array. The 
array type contains two refinements: the first refines the length of the array itself, 
and the second refines the entire array contents. The content refinement may 
refer to a symbolic index variable for precise, per-index refinements. At reads 
and writes to the array, CONSORT instantiates the refinement’s symbolic index 
variable with the concrete index used at the read/write. 

As in [56], our restriction to arrays of integers stems from the difficulty of 
ownership inference. Soundly handling pointer arrays requires index-wise tracking 
of ownerships which significantly complicates automated inference. We leave 
supporting arrays of pointers to future work. 


4.3 Limitations 


Our current approach is not complete; there are safe programs that will be rejected 
by our type system. As mentioned in Section 3.1, our well-formedness condition 
forbids refinements that refer to memory locations. As a result, CONSORT 
cannot in general express, e.g., that the contents of two references are equal. 
Further, due to our reliance on automated theorem provers we are restricted to 
logics with sound but potentially incomplete decision procedures. CONSORT 
also does not support conditional or context-sensitive ownerships, and therefore 
cannot precisely handle conditional mutation or aliasing. 


5 Experiments 


We now present the results of preliminary experiments performed with the imple- 
mentation described in Section 4. The goal of these experiments was to answer the 
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Table 1. Description of benchmark suite adapted from JayHorn. Java are programs 
that test Java-specific features. Inc are tests that cannot be handled by CONSORT, e.g., 
null checking, etc. Bug includes a “safe” program we discovered was actually incorrect. 


Set Orig. Adapted Java Inc Bug 


Safe 41 32 6 2 1 
Unsafe 41 26 13 2 0 


following questions: i) is the type system (and extensions of Section 4) expressive 
enough to type and verify non-trivial programs? and ii) is type inference feasible? 

To answer these questions, we evaluated our prototype implementation on two 
sets of benchmarks.* The first set is adapted from JayHorn [32, 33], a verification 
tool for Java. This test suite contains a combination of 82 safe and unsafe 
programs written in Java. We chose this benchmark suite as, like CONSORT, 
JayHorn is concerned with the automated verification of programs in a language 
with mutable, aliased memory cells. Further, although some of their benchmark 
programs tested Java specific features, most could be adapted into our low-level 
language. The tests we could adapt provide a comparison with existing state-of- 
the-art verification techniques. A detailed breakdown of the adapted benchmark 
suite can be found in Table 1. 


Remark 2. The original JayHorn paper includes two additional benchmark sets, 
Mine Pump and CBMC. Both our tool and recent JayHorn versions time out on 
the Mine Pump benchmark. Further, the CBMC tests were either subsumed by 
our own test programs, tested Java specific features, or tested program synthesis 
functionality. We therefore omitted both of these benchmarks from our evaluation. 


The second benchmark set consists of data structure implementations and 
microbenchmarks written directly in our low-level imperative language. We 
developed this suite to test the expressive power of our type system and inference. 
The programs included in this suite are: 


— Array-List Implementation of an unbounded list backed by an array. 

— Sorted-List Implementation of a mutable, sorted list maintained with an 
in-place insertion sort algorithm. 

— Shuffle Multiple live references are used to mutate the same location in 
program memory as in Example 5. 

— Mut-List Implementation of general linked lists with a clear operation. 

— Array-Inv A program which allocates a length n array and writes the value 
i at every index i. 

— Intro2 The motivating program shown in Figure 2 in Section 1. 


4 Our experiments and the CONSORT source code are available at https://www.fos. 
kuis.kyoto-u.ac.jp/projects/consort/. 
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Table 2. Comparison of CONSORT to JayHorn on the benchmark set of [32] (top) 
and our custom benchmark suite (bottom). T/O indicates a time out. 


ConSORT JayHorn 
Set N. Tests Correct T/O Correct T/O Imp. 
Safe 32 29 3 24 5 3 
Unsafe 26 26 0 19 0 T 

Name Safe? Time(s) Ann JH |Name Safe? Time(s) Ann JH 
Array-Inv v 10.07 0 T/O|Array-Inv-BUG x 5.29 0 T/O 
Array-List v 16.76 0 T/O|Array-List--BUG X 1.13 0 T/O 
Intro2 vV 0.08 0 T/O|Intro2-BUG X 0.02 0 T/O 
Mut-List v 1.45 3 T/O|Mut-List-BUG x 0.41 3 T/O 
Shuffle v 0.13 3 v |Shuffle-BUG x 0.07 3 x 
Sorted-List V 1.90 3 T/O|Sorted-List--BUG X 1.10 3 T/O 


We introduced unsafe mutations to these programs to check our tool for unsound- 
ness and translated these programs into Java for further comparison with JayHorn. 
Our benchmarks and JayHorn’s require a small number of trivially identi- 
fied alias annotations. The adapted JayHorn benchmarks contain a total of 6 
annotations; the most for any individual test was 3. The number of annotations 
required for our benchmark suite are shown in column Ann. of Table 2. 

We first ran CONSORT on each program in our benchmark suite and ran 
version 0.7 of JayHorn on the corresponding Java version. We recorded the final 
verification result for both our tool and JayHorn. We also collected the end-to-end 
runtime of CONSORT for each test; we do not give a performance comparison 
with JayHorn given the many differences in target languages. For the JayHorn 
suite, we first ran our tool on the adapted version of each test program and ran 
JayHorn on the original Java version. We also did not collect runtime information 
for this set of experiments because our goal is a comparison of tool precision, not 
performance. All tests were run on a machine with 16 GB RAM and 4 Intel i5 
CPUs at 2GHz and with a timeout of 60 seconds (the same timeout was used in 
[32]). We used CONSORT”’s parallel backend (Section 4) with Z3 version 4.8.4, 
HoICE version 1.8.1, and Eldarica version 2.0.1 and JayHorn’s Eldarica backend. 


5.1 Results 


The results of our experiments are shown in Table 2. On the JayHorn benchmark 
suite CONSORT performs competitively with JayHorn, correctly identifying 29 
of the 32 safe programs as such. For all 3 tests on which CONSORT timed out 
after 60 seconds, JayHorn also timed out (column T/O). For the unsafe programs, 
CoNSORT correctly identified all programs as unsafe within 60 seconds; JayHorn 
answered UNKNOWN for 7 tests (column Jmp.). 

On our own benchmark set, CONSORT correctly verifies all safe versions of 
the programs within 60 seconds. For the unsafe variants, CONSORT was able to 


706 J. Toman et al. 


quickly and definitively determine these programs unsafe. JayHorn times out on 
all tests except for Shuffle and ShuffeBUG (column JH). We investigated the 
cause of time outs and discovered that after verification failed with an unbounded 
heap model, JayHorn attempts verification on increasingly larger bounded heaps. 
In every case, JayHorn exceeded the 60 second timeout before reaching a pre- 
configured limit on the heap bound. This result suggests JayHorn struggles in 
the presence of per-object invariants and unbounded allocations; the only two 
tests JayHorn successfully analyzed contain just a single object allocation. 

We do not believe this struggle is indicative of a shortcoming in JayHorn’s 
implementation, but stems from the fundamental limitations of JayHorn’s memory 
representation. Like many verification tools (see Section 6), JayHorn uses a single, 
unchanging invariant to for every object allocated at the same syntactic location; 
effectively, all objects allocated at the same location are assumed to alias with one 
another. This representation cannot, in general, handle programs with different 
invariants for distinct objects that evolve over time. We hypothesize other tools 
that adopt a similar approach will exhibit the same difficulty. 


6 Related Work 


The difficulty in handling programs with mutable references and aliasing has been 
well-studied. Like JayHorn, many approaches model the heap explicitly at ver- 
ification time, approximating concrete heap locations with allocation site labels 
[14, 20, 32, 33, 46]; each abstract location is also associated with a refinement. As 
abstract locations summarize many concrete locations, this approach does not in 
general admit strong updates and flow-sensitivity; in particular, the refinement 
associated with an abstract location is fixed for the lifetime of the program. The 
techniques cited above include various workarounds for this limitation. For exam- 
ple, [14, 46] temporarily allows breaking these invariants through a distinguished 
program name as long as the abstract location is not accessed through another 
name. The programmer must therefore eventually bring the invariant back in 
sync with the summary location. As a result, these systems ultimately cannot 
precisely handle programs that require evolving invariants on mutable memory. 

A similar approach was taken in CQual [23] by Aiken et al. [2]. They used 
an explicit restrict binding for pointers. Strong updates are permitted through 
pointers bound with restrict, but the program is forbidden from using any pointers 
which share an allocation site while the restrict binding is live. 

A related technique used in the field of object-oriented verification is to declare 
object invariants at the class level and allow these invariants on object fields to be 
broken during a limited period of time [7, 22]. In particular, the work on Spec# 
[7] uses an ownership system which tracks whether object a owns object b; like 
CoNSORT’s ownership system, these ownerships contain the effects of mutation. 
However, Spec#’s ownership is quite strict and does not admit references to b 
outside of the owning object a. 

Viper [30, 42] (and its related projects [31, 39]) uses access annotations (ex- 
pressed as permission predicates) to explicitly transfer access/mutation permis- 
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sions for references between static program names. Like CONSORT, permissions 
may be fractionally transferred, allowing temporary shared, immutable access to 
a mutable memory cell. However, while CONSORT automatically infers many 
ownership transfers, Viper requires extensive annotations for each transfer. 

F*, a dependently typed dialect of ML, includes an update/select theory of 
heaps and requires explicit annotations summarizing the heap effects of a method 
[44, 57, 58]. This approach enables modular reasoning and precise specification of 
pre- and post-conditions with respect to the heap, but precludes full automation. 

The work on rely-guarantee reference types by Gordon et al. [26, 27] uses re- 
finement types in a language mutable references and aliasing. Their approach ex- 
tends reference types with rely/guarantee predicates; the rely predicate describes 
possible mutations via aliases, and the guarantee predicate describes the admissi- 
ble mutations through the current reference. If two references may alias, then the 
guarantee predicate of one reference implies the rely predicate of the other and 
vice versa. This invariant is maintained with a splitting operation that is similar 
to our + operator. Further, their type system allows strong updates to reference 
refinements provided the new refinements are preserved by the rely predicate. 
Thus, rely-guarantee refinement support multiple mutable, aliased references 
with non-trivial refinement information. Unfortunately this expressiveness comes 
at the cost of automated inference and verification; an embedding of this system 
into Liquid Haskell [63] described in [27] was forced to sacrifice strong updates. 

Work by Degen et al. [17] introduced linear state annotations to Java. To effect 
strong updates in the presence of aliasing, like CONSORT, their system requires 
annotated memory locations are mutated only through a distinguished reference. 
Further, all aliases of this mutable reference give no information about the state 
of the object much like our 0 ownership pointers. However, their system cannot 
handle multiple, immutable aliases with non-trivial annotation information; only 
the mutable reference may have non-trivial annotation information. 

The fractional ownerships in CONSORT and their counterparts in [55, 56] 
have a clear relation to linear type systems. Many authors have explored the 
use of linear type systems to reason in contexts with aliased mutable references 
(18, 19, 52], and in particular with the goal of supporting strong updates [1]. 
A closely related approach is RustHorn by Matsushita et al. [40]. Much like 
CONSORT, RustHorn uses CHC and linear aliasing information for the sound 
and—unlike CONSORT—complete verification of programs with aliasing and 
mutability. However, their approach depends on Rust’s strict borrowing discipline, 
and cannot handle programs where multiple aliased references are used in the 
same lexical region. In contrast, CONSORT supports fine-grained, per-statement 
changes in mutability and even further control with alias annotations, which 
allows it to verify larger classes of programs. 

The ownerships of CONSORT also have a connection to separation logic 
[45]; the separating conjunction isolates write effects to local subheaps, while 
CoNSORT’s ownership system isolates effects to local updates of pointer types. 
Other researchers have used separation logic to precisely support strong updates 
of abstract state. For example, in work by Kloos et al. [36] resources are associated 
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with static, abstract names; each resource (represented by its static name) may 
be owned (and thus, mutated) by exactly one thread. Unlike CONSORT, their 
ownership system forbids even temporary immutable, shared ownership, or 
transferring ownerships at arbitrary program points. An approach proposed by 
Bakst and Jhala [4] uses a similar technique, combining separation logic with 
refinement types. Their approach gives allocated memory cells abstract names, and 
associates these names with refinements in an abstract heap. Like the approach 
of Kloos et al. and CONSORT’s ownership 1 pointers, they ensure these abstract 
locations are distinct in all concrete heaps, enabling sound, strong updates. 
The idea of using a rational number to express permissions to access a refer- 
ence dates back to the type system of fractional permissions by Boyland [12]. His 
work used fractional permissions to verify race freedom of a concurrent program 
without a may-alias analysis. Later, Terauchi [59] proposed a type-inference algo- 
rithm that reduces typing constraints to a set of linear inequalities over rational 
numbers. Boyland’s idea also inspired a variant of separation logic for a concurrent 
programming language [11] to express sharing of read permissions among several 
threads. Our previous work [55, 56], inspired by that in [11, 59], proposed meth- 
ods for type-based verification of resource-leak freedom, in which a rational num- 
ber expresses an obligation to deallocate certain resource, not just a permission. 
The issue of context-sensitivity (sometimes called polyvariance) is well-studied 
in the field of abstract interpretation (e.g., [28, 34, 41, 50, 51], see [25] for a recent 
survey). Polyvariance has also been used in type systems to assign different behav- 
iors to the same function depending on its call site [3, 6, 64]. In the area of refine- 
ment type systems, Zhu and Jagannathan developed a context-sensitive dependent 
type system for a functional language [67] that indexed function types by unique 
labels attached to call-sites. Our context-sensitivity approach was inspired by this 
work. In fact, we could have formalized context-polymorphism within the frame- 
work of full dependent types, but chose the current presentation for simplicity. 


7 Conclusion 


We presented CONSORT, a novel type system for safety verification of imperative 
programs with mutability and aliasing. CONSORT is built upon the novel combi- 
nation of fractional ownership types and refinement types. Ownership types flow- 
sensitively and precisely track the existence of mutable aliases. CONSORT admits 
sound strong updates by discarding refinement information on mutably-aliased 
references as indicated by ownership types. Our type system is amenable to auto- 
matic type inference; we have implemented a prototype of this inference tool and 
found it can verify several non-trivial programs and outperforms a state-of-the-art 
program verifier. As an area of future work, we plan to investigate using fractional 
ownership types to soundly allow refinements that mention memory locations. 
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Abstract. Session types describe patterns of interaction on commu- 
nicating channels. Traditional session types include a form of choice 
whereby servers offer a collection of options, of which each client picks 
exactly one. This sort of choice constitutes a particular case of separated 
choice: offering on one side, selecting on the other. We introduce mixed 
choices in the context of session types and argue that they increase the 
flexibility of program development at the same time that they reduce 
the number of synchronisation primitives to exactly one. We present a 
type system incorporating subtyping and prove preservation and absence 
of runtime errors for well-typed processes. We further show that classi- 
cal (conventional) sessions can be faithfully and tightly embedded in 
mixed choices. Finally, we discuss algorithmic type checking and a run- 
time system built on top of a conventional (choice-less) message-passing 
architecture. 


Keywords: Type Systems - Session Types - Mixed Choice. 


1 Introduction 


Session types provide for describing series of continuous interactions on commu- 
nication channels [16,19,43,45,49]. When used in type systems for programming 
languages, session type systems statically verify that programs follow protocols, 
and hence that they do not engage in communication mismatches. 

In order to motivate mixed sessions, suppose that we want to describe a 
process that asks for a fixed but unbounded number of integer values from some 
producer. The consumer may be in two states: happy with the values received 
so far, or ready to ask the producer for a new value. In the former case it must 
notify the producer so that this may stop sending numbers. In the latter case, 
the client must ask the producer for another integer, after which it “goes back 
to the beginning”. Using classical sessions, and looking from the consumer side, 
the communication channel can be described by a (recursive) session type T of 
the form 


@{enough:end, more: ?int.T} 


where & denotes internal choice (the consumer decides), the two branches in the 
choice are labelled with enough and more, type end denotes a channel on which 
no further interaction is possible, and ?int denotes the reception of an integer 
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value. Reception is a prefix to a type, the continuation is T (in this case the “goes 
back to the beginning” part). The code for the consumer (and the producer as 
well) is unnecessarily complex, featuring parts that exchange messages in both 
directions: enough and more selections from the consumer to the producer, and int 
messages from the producer to the consumer. In particular, the consumer must 
first select option enough (outgoing) and then receive an integer (incoming). 
Using mixed sessions one can invert the direction of the more selection and 
write the type of the channel (again as seen from the side of the consumer) as 


@{enough! unit.end, more?int.T} 


The changes seem merely cosmetic, but label/polarity pairs (polarity is ! or ?) 
are now indivisible and constitute the keys of the choice type when seen as a 
map. The integer value is piggybacked on top of selection more. As a result, the 
classical session primitive operations: selection and branching (that is, internal 
and external choice) and communication (output and input) become one only: 
mixed session. The producer can be safely written as 


p (enough?z. 0 + more!n. produce!(p, n+1)) 


offering a choice on channel end p featuring mixed branches with labels enough? 
and more!, where 0 denotes the terminated process and produce(p, n+1) a recur- 
sive call to the producer. The example is further developed in Section 2. 

Mixed sessions build on Vasconcelos presentation of session types which we 
call classical sessions [43], by adapting choice and input/output as needed, but 
keeping everything else unchanged as much as possible. The result is a language 
with 

— a single synchronisation/communication primitive: mixed choice on a given 
channel that 

— allows for duplicated labels in choice processes, leading to non-determinism 
in a pure linear setting, and 

— replicated output processes arising naturally from replicated mixed choices, 
and that 

— enjoys preservation and absence of runtime errors for typable processes, and 

— provides for embedding classical sessions in a tight type and operational 
correspondence. 


The rest of the paper is organised as follows: the next section shows mixed ses- 
sions in action; Section 3 introduces the technical development of the language, 
and Section 4 proves the main results (preservation and absence of runtime 
errors for typable processes). Then Section 5 presents the embedding and the 
correspondence proofs, Section 6 discusses implementation details, and Section 7 
explores related work. Section 8 concludes the paper. 


2 There is Room for Mixed Sessions 


This section introduces the main ideas of mixed sessions via examples. We ad- 
dress mized choices, duplicated labels in choices, and unrestricted output, in this 
order. 
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2.1 Mixed Choices 


Consider the producer-consumer problem where the producer produces only in- 
sofar as so requested by the consumer. Here is the code for a producer that 
writes on channel end x numbers starting from n. 


def produce (x, n) = 
lin x (enough?z. 0 + 
more!n. produce!(x, n+1) 
) 


Syntax qx(M+N) introduces a choice between M and N on channel end x. Qualifier 
q is either un or lin and controls whether the process is persistent (remains after 
reduction) or is ephemeral (is consumed in the reduction process). Each branch 
in a choice is composed of a label (enough or more), a polarity mark (input ? 
or output !), a variable or a value (z or n), and a continuation process (after 
the dot). The terminated process is represented by 0; notation def introduces a 
recursive process. The def syntax and its encoding in the base language is from 
the Pict programming language [36] and taken up by Sepi [12]. 

A consumer that requests n integer values on channel end y can be written 
as follows, where () represents the only value of type unit. 


def consume (y, n) = 


if n = 0 
then lin y (enough!(). 0) 
else lin y (more?z. consume!(x, n—1)) 


Suppose that x and y are two ends of the same channel. When choices on x and 
on y get together, a pair of matching label-polarities pairs is selected and a value 
transmitted from the output continuation to the input continuation. 

Types for the two channel ends ensure that choice synchronisation succeeds. 
The type of x is rec a. lin &{enough?unit.end, more!int.a} where the qualifier lin 
says that the channel end must be used in exactly one process, & denotes external 
choice, and each branch is composed of a label, a polarity mark, the type of the 
communication, and that of the continuation. The type end states that no further 
interaction is possible at the channel and rec introduces a recursive type. The 
type of y is obtained from that of x by inverting views ( and &) and polarities 
(! and ?), yielding rec b. lin@{enough!unit.end, more?int.b}. The choice at x in the 
produce process contains all branches in the type and so we select an external 
choice view & for x. The choices at y contain only part of the branches, hence 
the internal choice view ¢. This type discipline ensures that processes do not 
engage in runtime errors when trying to find a match for two choices at the two 
ends of a given channel. 

A few type and process abbreviations simplify coding: i) the lin qualifier 
can be omitted, ii) the terminated process 0 together with the trailing dot can 
be omitted; iii) the terminated type end together with the trailing dot can be 
omitted; and iv) we introduce wildcards (_) in variable binding positions (in 
input branches). 
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2.2 Duplicated Labels in Choices for Types and for Processes 


Classical session types require distinct identifiers to label distinct branches. 
Mixed sessions relax this restriction by allowing duplicated labels whenever 
paired with distinct polarities. The next example describes two processes— 
countDown and collect —that bidirectionally exchange a fixed number of msg- 
labelled messages. The number of messages that flow in each direction is not 
fixed a priori, but instead decided by the non-deterministic operational seman- 
tics. The type that describes the channel, as seen by process countDown, is rec 
a.@{msg!unit.a, msg?unit.a, done!unit}, where one can see the msg label in two 
distinct branches, but with different polarities. 

Process countDown features a parameter n that controls the number of mes- 
sages exchanged (sent or received). The end of the interaction (when n reaches 
0) is signalled by a done message. 


countDown : (rec a.@{msg!unit.a, msg?unit.a, done! unit}, int) 
def countDown (x, n) = 
if n= 0 


then x (done!()) 
else x (msg!(). countDown!(x, n—1) + 
msg?_. countDown!(x, n—1)) 


Process collect sees the channel from the dual viewpoint, obtained by ex- 
changing ? with ! and ® with &. Parameter n in this case denotes the number 
of messages received. When done, the process writes the result on channel end r, 
global to the collect process. 


collect : (rec b.&{msg!unit.b, msg?unit.b, done?unit}, int) 
def collect (y, n) = 
y (msg!(). collect!(y, n+1) + 
msg?_. collect!(y, n) + 
done?_. r (result !n)) 


Mixed sessions allow for duplicated message-polarity pairs permitting a new 
form of non-determinism that uses exclusively linear channels. A process of the 
form (vxy)P declares a channel with end points x and y to be used in process P. 
The process 


(vxy ) ( 
x (msg!()) | 
y (msg?_. z (ml true) + msg?_. z (m!false)) 


) 


featuring two linear choices may reduce to z (m!true) or to z (m!false). Non- 
determinism in the z-calculus without choice (that of Functions as Processes 
[27,29] for example) can only be achieved by introducing race conditions on un 
channels. For example, the a-calculus process 


(vxy)(x!() | y?_.z!true | y?_.z!false)) 
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reduces either to (z!true | (vxy)y?_.z!false)) or to (z! false | (vxy)y?_.z!true)), 
leaving for the runtime the garbage collection of the inert residuals. Also note 
that in this case, channel y cannot remain linear. 

Duplicated message-polarities in choices lead to elegant and concise code. A 
random number generator with a given number n of bits can be written with two 
processes. The first process sends n messages on channel end x. The contents of 
the messages are irrelevant (we use value () of type unit); what is important is 
that n more messages are sent, followed by a done message, followed by silence. 


write : (rec a.@{done!unit, more! unit.a}, int) 
def write (x, n) = 
if n = 0 


then x(done!()) 
else x(more!(). write!(x, n—1)) 


The reader process reads the more messages in two distinct branches and 
interprets messages received on one branch as bit 0, and on the other as 1. Upon 
the reception of a done message, the accumulated random number is conveyed 
on channel end r, a variable global to the read process. 


read : (rec b.&{done?unit, more?unit.b}, int) 
def read (y, n) = 
y (done?_. r (result!n) + 
more?_. read!(y, 2*n) + 
more?_. read!(y, 2«*n+1) 


) 


Notice that mixed sessions allow duplicated label-polarity pairs in processes 
but not in types. This point is further discussed in Section 3. Also note that 
duplicated message labels could be easily added to traditional session types. 


2.3 Unrestricted Output 


Mixed sessions allow for replicated output processes. The original version of 
the z-calculus [30,31] features recursion on arbitrary processes. Subsequent ver- 
sions [29] introduce replication but restricted to input processes. When compared 
to languages with unrestricted input only, unrestricted output allows for more 
concise programs and fewer message exchanges for the same effect. Here is a 
process (call it P) containing a pair of processes that exchange msg-labelled 
messages ad-aeternum, 


(vxy)(un y (msg!()) | un x (msg?_)) 


where x is of type rec a.un &{msg?unit.a}. The un prefix denotes replication: an 
un choice survives reduction. Because none of the two sub-processes features a 
continuation P reduces to P in one step. The behaviour of un y (msg!()) can be 
mimicked by a process without output replication, namely, 


(vwz) w (£1()) | un z (£?-. y (msg!(). w (£1()))) 
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Ü i= Values: 
x variable 
true | false boolean values 
(0) unit value 

P i= Processes: 
qx 5 Mi choice 

ie 

P| P parallel composition 
(vax)P scope restriction 

if v then P else P conditional 

(0) inaction 

M z= Branches: 
l*v.P branch 

= Polarities: 
! |? out and in 

C= Qualifiers: 
lin | un linear and unrestricted 


Fig. 1: The syntax of processes 


Even if unrestricted output can be simulated with unrestricted input, the encod- 
ing requires one extra channel (wz) and an extra message exchange (on channel 
wz) in order to reestablish the output on channel end y. 


It is a fact that unrestricted output can be added to any flavour of the r- 
calculus (session-typed or not). In the case of mixed sessions it arises naturally: 
there is only one communication primitive—choice—and this can be classified as 
lin or un. If an un-choice happens to behave in “output mode”, then we have an un- 
output. It is not obvious how to design the language of mixed choices without 
allowing unrestricted output, while still allowing unrestricted input (which is 
mandatory for unbounded behaviour). 


3 The Syntax and Semantics of Mixed Sessions 


This section introduces the syntax and the semantics of mixed sessions. Inspired 
in Vasconcelos’ formulation of session types for the z-calculus [43,45], mixed 
sessions replace input and output, selection and branching (internal and external 
choice), with a single construct which we call choice. 
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3.1 Syntax 


Figure 1 presents the syntax of values and processes. Let x,y,z range over a 
(countable) set of variables, and let | range over a set of labels. Metavariable v 
ranges over values. Following the tradition of the 7-calculus, set up by Milner 
et al. [30,31], variables are used both as placeholders for incoming values in 
communication and for channels. Linearity constraints, central to session types 
but absent in the z-calculus, dictate that the two ends of a channel must be 
syntactically distinguished; we use one variable for each end [43]. Different prim- 
itive values can be used. Here, we pick the boolean values (so that we may have 
a conditional process), and unit that plays its role in the embedding of classical 
session types (Section 5). 

Metavariables P and Q range over processes. Choices are processes of the 
form qx >) ;cr Mi offering a choice of M; alternatives on channel end x. Qualifier q 
describes how choice behaves with respect to reduction. If q is lin, then the choice 
is consumed in reduction, otherwise q must be un, and in this case the choice 
persists after reduction. The type system in Figure 8 rejects nullary (empty) 
choices. There are two forms of branches: output l'v.P and input l'x.P. An 
output branch sends value v and continues as P. An input branch receives a 
value and continues as P with the value replacing variable x. The type system 
in Figure 8 makes sure that value v in I’v.P is a variable. 

The remaining process constructors are standard in the z-calculus. Processes 
of the form P | Q denote the parallel composition of processes P and Q. Scope 
restriction (vxy)P binds together the two channel ends x and y of a same channel 
in process P. The conditional process if v then P else Q behaves as process P if 
v is true and as process Q otherwise. Since we do not have nullary choices, we 
include O—called inaction—as primitive to denote the terminated process. 


3.2 Operational Semantics 


The variable bindings in the language are as follows: variables x and y are bound 
in P, in a process of the form (vay)P; variable x is bound in P in a choice of 
the form l’x.P. The sets of bound and free variables, as well as substitution, 
Plu/z], are defined accordingly. We work up to alpha-conversion and follow 
Barendregt’s variable convention, whereby all variables in binding occurrences 
in any mathematical context are pairwise distinct and distinct from the free 
variables [2]. 

Figure 2 summarises the operational semantics of mixed sessions. Following 
the tradition of the z-calculus, a binary relation on processes— structural congru- 
ence—rearranges processes when preparing for reduction. Such an arrangement 
reduces the number of rules included in the operational semantics. Structural 
congruence was introduced by Milner [27,29]. It is defined as the least congru- 
ence relation closed under the axioms in Figure 2. The first three rules state that 
parallel composition is commutative, associative, and takes inaction as the neu- 
tral element. The fourth rule is commonly known as scope extrusion [30,31] and 
allows extending the scope of channel ends x, y to process Q. The side-condition 
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Structural congruence, P = P 


PI|Q=QIP. (PIQ)IR=P|(Q|R) P|0=P 
(vzy)P | Q = (vay)(P | Q) (vzy)0 = 0 (vwa)(vyz)P = (vyz)(vwx)P 


Reduction, P + P 


if true then P else Q > P if false then P else Q > Q [R-IFT] [R-IFF 
(vzy)(linz(M + l'v.P + M’) | liny(N +1'2z.Q + N’) | R) > (vry)(P | Q[v/z] | R) 
[R-LINLIN 
(vzy)(linz(M + l'v.P + M’) | uny(N +7 z.Q + N’) | R) > [R-LINUN 
(vey)(P | Q[v/z] | uny(N +1°z.Q + N’) | R) 
(vzy)(unz(M +1'v.P + M’) | liny(N +7 z.Q + N’) | R) > [R-UNLIN 
(vey)(P | Q[v/z] | unz(M +1'v.P + M’) | R) 
(vey)(une(M + l'v.P + M’) | uny(N +1°z.Q+.N’)| R) > [R-UNUN 
(vey)(P | Q[v/z] | unz(M + l'v.P + M’) | uny(N +1'z.Q + N’) | R) 
P+Q P+Q P= P PESQ Q’=Q 
(vzy)P > (vry)Q P\|R>Q|R P> Q 


[R-REs] [R-PAR] [R-STRUCT] 


Fig. 2: Operational semantics 


“x and y not free in Q” is redundant in face of the Barendregt convention. The 
fifth rule allows collecting channel bindings no longer in use, and the last rule 
allows for rearranging the order of channel bindings in a process. 

Reduction includes six axioms, two for the destruction of boolean values (via 
a conditional process), and four for communication. The axioms for communi- 
cation take processes of a similar nature. The scope restriction (vay) identifies 
the two ends of the channel engaged in communication. Under the scope of the 
channel one finds three processes: the first contains an output process on chan- 
nel end a, the second contains an input process on channel end y, and the third 
(R) is an arbitrary process that may contain other references to x and y (the 
witness process). Communication proceeds by identifying a pair of compatible 
branches, namely l'v.P and 1’z.Q. The result contains the continuation pro- 
cess P and the continuation process Q with occurrences of the bound variable z 
replaced by value v (together with the witness process). The four axioms differ 
in the treatment of the process qualifiers: lin (ephemeral) and un (persistent). 
Ephemeral processes are consumed in reduction, persistent processes remain in 
the contractum. 

Choices apart, rules [R-LINLIN] and [R-LINUN] are already present in the 
works of Milner and Vasconcelos [29,43]. Rules [R-UNLIN] and [R-LINLIN] are 
absent on the grounds of economy: replicated output can be simulated with a 
new channel and a replicated in input. In mixed choices these rules cannot be 


Mixed Sessions 723 


T = Types: 
qh{ Ui fier choice 
end termination 
unit | bool unit and boolean 
paT recursive type 
a type variable 

C= Branches: 
VEE branch 

ies Views: 
| & internal and external 

T = Contexts: 

empty 
TDT entry 


Fig. 3: The syntax of types 


omitted for there is no distinction between input and output: choice is the only 
(symmetrical) communication primitive. 

We have designed mixed choices in such a way that labels may be duplicated 
in choices; more: label-polarity pairs may be also be duplicated. This allows for 
non-determinism in a linear context. For example, process 


(vy) (lin z(l'true.0 + l'false.0) | lin y(t? z.lin w(m'z.0))) 


reduces in one step to either lin w(m'true.0) or lin w(m'false.0). 

The examples in Section 2 take advantage of a def notation, a derived process 
construct inspired in the SePi [12] and the Pict languages [36]. A process of the 
form def x(z) = P in Q is understood as 


(vay)(un y(e'z.P) | Q)) 


and calls to the recursive procedure, of the form ziw, are interpreted as lin (lv), 
for £ an arbitrarily chosen label. The derived syntax hides channel end y and 
simplifies the syntax of calls to the procedure. Procedures with more than one 
parameter require tuple passing, a notion that is not primitive to mixed sessions. 
Fortunately, tuple passing is easy to encode; see Vasconcelos|[43]. 


3.3 Typing 


Figure 3 summarises the syntax of types. We rely on an extra set, that of type 
variables, a,b,... Types describe values, including boolean and unit values, and 
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Branch subtyping, U <: U 


So <:S1 Ty <: To Si <: Sg T<: To 
US,.T, xi US2.T2 C SiTi <: 1? S2.To 


Subtyping, T <: T 


S[ua.S/a] <: T S <: T[ywa.T/a] 
end <: end unit <: unit bool <: bool pas <: T S <: pa.T 
JCI Uj <: Vj ICJ U; <: Vi 
q®{Ui jier <: qD{V; }jes q&{Ui jier <: q&{V; bier 


Fig. 4: Coinductive subtyping rules 


channel ends. A type of the form gi{Ui}ie, denotes a channel end. Qualifier q 
states the number of processes that may contain references to the channel end: 
exactly one for lin, zero or more for un. View ¢ distinguishes external (@) from in- 
ternal (&) choice. This distinction is not present in processes but is of paramount 
importance for typing purposes, as we shall see. The branches are either of 
output—l'S.T—or of input—l’S.T—nature. In either case, S denotes the ob- 
ject of communication and T describes the subsequent behaviour of the channel 
end. Type end denotes the channel end on which no more interaction is possible. 
Types a.T and a cater for recursive types. 

Types are subject to a few syntactic restrictions: i) choices must have at least 
one branch; ii) label-polarity pairs—l*—are pairwise distinct in the branches of 
a choice type (unlike in processes); iii) recursive types are assumed contractive 
(that is, containing no subterm of the form ja, ...Wan.a,). New variables, new 
bindings: type variable a is bound in T in type pa.T. Again the definitions 
of bound and free names as well as that of substitution—S[T/a]—are defined 
accordingly. 

Mixed sessions come equipped with a notion of subtyping. Figure 4 introduces 
the rules that allow determining whether a given type is subtype of another. 
The rules must be read coinductively. Base types (end, unit, bool) are subtypes 
to themselves. The rules for recursive types are standard. Subtyping behaves 
differently in presence of external or internal choice. For external choice we re- 
quire the branches in the subtype to contain those in the supertype: exercising 
less options cannot cause difficulties on the receiving side. For internal choice we 
require the opposite: here offering more choices can not cause runtime errors. 
For branches we distinguish output from input: output is contravariant on the 
contents of the message, input is covariant. In either case, the continuation is 
covariant. Choices, input/output, and recursive types receive no different treat- 
ment than those in classical sessions [15]. We can easily show that the <: relation 
is a preorder. Notation S = T abbreviates S <: T and T <: S. 

Duality is a notion central to session types. In order for channel communi- 
cation to proceed smoothly, the two channel ends must be compatible: if one 
end says input, the other must say output; if one end says external choice, the 
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Polarity duality and view duality, t Lt and x Lx 
Polk? ?1! @L& &LO 
Type duality, T LT 


end L end gt{ ls Si. Ti fier 1 ql St. Ti ier 
S[ua.S/a] LT S L T[|ua.T/a] 
paS LT S L ua.T 


Fig. 5: Coinductive type duality rules 


un and lin predicates, un(T), lin(T) 


un(T) 


un(end) un(unit) un(bool) un(un#{U;}) maT in(T) 


Fig. 6: The un and lin predicates on types 


other must say internal choice. In presence of recursive types, the problem of 
building the dual of a given type has been elusive, as works by Bernardi and 
Hennessy, Bono and Padovani, Lindley and Morris show [5,7,25]. Here we eschew 
the problem by working with a duality relation, as in Gay and Hole [15]. 

The rules in Figure 5 define what we mean for two types to be dual. This 
is the coinductive definition of Gay and Hole in rule format (and adapted to 
choice). Duality is defined for session types only. Type end is the dual of itself. 
The rule for choice types requires dual views (& is the dual of @, and vice-versa) 
and dual polarities (? is the dual of !, and vice-versa). Furthermore, the objects 
of communications must be equivalent (S; = 5‘) and the continuations must be 
dual again (T; L T/). The rules in the second line handle recursion in the exact 
same way as in type equivalence. As an example, we can easily show that 


pa.lin ® {l° bool. lin&{m'unit.a}} L lin&{l'bool.ub.ling {m unit.lin&{1'bool.b}} } 


It can be shown that L is an involution, that is, if RL S and S LT, then 
R=T. 

The meaning of the un and lin predicates are defined by the rules in Fig- 
ure 6. Basic types—unit, bool, end—are unrestricted; un-annotated choices are 
unrestricted; a.T is unrestricted if T is. Contractivity ensures that the predi- 
cate is total. All types are lin, meaning that both lin and non-lin types may be 
used in linear contexts. 

Before presenting the type system, we need to introduce two notions that 
manipulate typing contexts. The rules in Figure 7 define the meaning of contest 
split and context update. These two relations are taken verbatim from Vasconce- 
los [43]; context split is originally from Walker [48] (cf. Kobayashi et al. [22,23]). 
Context split is used when type checking processes with two sub-processes. In 
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Context split, L = Tol 


T,oln=T un(T’) 
T,x:T = (Tna: T) 0 (1,2: T) 
r= T;o I r= oI 
T, x: linp = (14,2: lin p) o T> T, x: linp = T o (I>, x: lin p) 


-=-O- 


Context update, 0+2a:T=TI 


e:U¢r un(T) T=U 
Pe: T=J,¢:T (T,z: T)+z: U = (T,x: T) 


Fig. 7: Inductive context split and context update rules 


this case we split the context in two, by copying unrestricted entries to both 
contexts and linear entries to one only. Context update is used to add to a given 
context an entry representing the continuation (after a choice operation) of a 
channel. If the variable in the entry is not in the context, then we add the entry 
to the context. Otherwise we require the entry to be present in the context and 
the type to be unrestricted. 


The rules in Figure 8 introduce the typing system for mixed sessions. Here the 
un and lin predicates on types are pointwise extended to typing contexts. Notice 
that all contexts are linear and only some contexts are unrestricted. We require 
all instances of the axioms to be built from unrestricted contexts, thus ensuring 
that linear resources (channel ends) are fully consumed in typing derivations. 


The typing rules for values should be straightforward: constants have their 
own types, the type for a variable is read from the context, and [T-SuB] is the 
subsumption rule, allowing a type to be replaced by a supertype. 


The rules for branches—[T-OvutT] and [T-IN]—follow those for output and 
input in classical session types. To type an output branch we split the context 
in two: one part for the value, the other for the continuation process. To type an 
input branch we add an entry with the bound variable x to the context under 
which we type the continuation process. Rule [T-IN] rejects branches of the form 
l?v.P when v not a variable. The continuation type T is not used in neither rule; 
instead it is incorporated in the type for the channel in I (cf. rule [T-CHOICE] 
below). 

The rules for inaction, parallel composition, and conditional are from Vas- 
concelos [43]. That for scope restriction is adapted from Gay and Hole [15]. Rule 
[T-INACT] follows the general pattern for axioms, requiring a un context. Rule 
[T-PAR] splits the context in two, providing each subprocess with one part. Rule 
[T-IF] splits the context and uses one part to type guard v. Because v is unre- 
stricted, we know that I, contains exactly the un entries in I, o ly and that I> 
is equal to I o I2. Context I> is used to type both branches of the conditional, 
for only one of them will ever execute. Rule [T-REs] introduces in the typing 
context entries for the two channel ends, x and y, at dual types. 
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Typing rules for values, I F v: T 


un(I’) un(J’) un(I1, T2) Trois S<:T 
DE (): unit T F true, false: bool Iy,0:T,In-a:T Trou: T 
[T-Unrt] [T-TRvE] [T-FALsE] [T-VAR] [T-SuB] 


Typing rules for branches, + M: U 


IykFu: S$ IaH P Pia: SEP 
D o Io H l'w.P: UST rHeLlzP: CST 


[T-OurT] [T-Iy] 


Typing rules for processes, I} P 


u(f) KEP DREQ 
T-I T-P 
THO To hF PQ eee ea 
Tyku:bool DEP DREQ ia:S,y:THP SLT 
Ti ofa F ifu then P else Q TF (vay)P pte Bee) 


qı(TioT>2) Iyb a: goH S:.Ti bier Ig+a: T; F Gu-Pi: G S5.T; T hes = {G Jier 
lolh Far Fi 


jes 505 
[T-CHOICE] 


Fig. 8: Inductive typing rules 


The rule for choice is new. The incoming context is split in two: one for the 
subject x of the choice, the other for the various branches in the choice. The 
qualifier of the process, q1, dictates the nature of the incoming context: un or lin. 
This allows for a linear choice to contain channels of an arbitrary nature, but 
limits unrestricted choices to unrestricted channels only (for one cannot predict 
how many times such choices will be exercised). The second premise extracts a 
type goi{l7.5;.7;} for x. The third premise types each branch: type 5; is used to 
type values v; in the branches and each type T} is used to type the corresponding 
continuation. The rule updates context I with the continuation type of a: if 
q2 is lin, then x is not in I) and the update operation simply adds the entry 
to the context. If, on the other hand, q2 is un, then x is in I> and the context 
update operation (together with rule [T-Sus]) insists that type Tj is a subtype 
of unf{l7.9;.7;}, meaning that T} is a recursive type. 


The last premise to rule [T-CHOICE] insists that the set of labels in the 
choice type coincides with that in the choice process. That does not mean that 
the label-polarity pairs are in a one-to-one correspondence: label-polarity pairs 
are pairwise distinct in types (see the syntactic restrictions in Section 3.3), 
but not in processes. For example, process lina(l’y.0 + I’z.0) can be typed 
against context x: lin@{l’bool.end}. From the fact that the two sets must co- 
incide does not follow that the label-polarity pairs type in the context must 
coincide with those in the process. Taking advantage of subtyping, the above 
process can still be typed against context x: lin@{1’ bool.end, m'unit.end} because 
lin@ {l bool.end, m'unit.end} <: lin@{1’bool.end}. The opposite phenomenon hap- 
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pens with external choice, where one may remove branches by virtue of subtyp- 
ing. 

We complete this section by discussing examples that illustrate options taken 
in the typing system (we postpone the formal justification to Section 4). Suppose 
we allow empty choices in the syntax of types. Then the process 


(vey) (z0 | y0) 


would be typable by taking x: @(),y: &(), yet the process would not reduce. 
We could add an extra reduction rule for the effect 


(vey)(xQ) |yQ | R) > (vey) R 


which would satisfy preservation (Theorem 2). We decided not to include it in 
our reduction rules as we did not want the extra complexity. Including the rule 
also does not bring any apparent benefit. 

The syntax of processes places no restrictions on the label-polarity pairs 
in choices; yet that of types does. What if we relax the restriction that label- 
polarities pairs in choice types must be pairwise distinct? Then process 


(vy) (a(I'true + U'()) | y(I°z.if z then O else 0)) 


could be typed under context x: &{I'bool, l'unit}, y: © {l’bool, l’unit}, yet the 
process might reduce to if () then O else O which is a runtime error. 


4 Well-typed Mixed Sessions Do Not Lead to Runtime 
Errors 


This section introduces the main results of mixed choices: absence of runtime 
errors and preservation, both for well-typed processes. 
We say that a process is a runtime error if it is structurally congruent to: 


— a process of the form 
(very)... (Vtnyn)(vay) (gr X. Zvi P,; | d'y X w;-Q; |R) 
icI jEJ 


where {1}? Jer N {I} je7 = 0 with each e; is obtained by dualising x;, or 
— a process of the form qz(M + l’v.P + N) and v is not a variable, or 
— a process of the form if v then P else Q and v is neither true nor false. 


Examples of processes which are runtime errors include: 
(vary) (lina (l'true.0) | liny(1'true.O)) 
(vy) (unx(I'true.0) | liny(m* z.0)) 


unx(I' false.0) 
if () then O else O 
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Notice that processes of the form (vxy)linz >),-; Mi cannot be classified as 
runtime errors for they may be typed. Just think of (vay)linz(I?z.liny(I'true.0)), 
typable under the empty context. Unlike the interpretations of session types 
in linear logic by Caires, Pfenning and Wadler [8,14,46,47], typable mixed ses- 
sion processes can easily deadlock. Similarly, processes with more than one lin- 
choice on the same channel end can be typed. For example process lina(I'true.0) | 
lina (1? z.0)) can be typed under context x: pa.un @ {l'unit.a, l” bool.a}. Recall the 
relationship between qualifiers in processes qı and those in types q2 in the dis- 
cussion of the rules for choice in Section 3. 


Theorem 1 (Well-typed processes are not runtime errors). If- F P, 
then P is not a runtime error. 


Proof. In view of a contradiction, assume that -+ P and that P is 


(Vrryi)-.-(VEnYn)(U12n 5 vi. Pi | Yn 5 [;w;.Q; | R) 
icl J&J 


and {I?}ier N {lž}jez = Ø with x;Le;. From the typing derivation for P, 
using [T-PAR] and [T-REs], we obtain a context [ = Iį o I o I3 = 
zı: T1, Y1: $1,---;2n: Tn, Yn: Sn, Ti L S; for all i = 1,...,n and that I} F 
Qn Jier G viPi and Pn F q2yn je lžwj.Qj and I3 F R. Without loss of 
generality, due to the fact that £n and yn have dual types and from the 
premises of rule [T-CHOICE], assume that I} F zn: g &{GT;]- Ti; new and 
T3 F Yni BOULS, Seheeks {Zier = {lk}kex and {l} }jes C {lh}nex, 
with x,1le;. This also implies that {I?}icr = {l}}xex. Thus, a label l} from 
qui ae I*w;.Q; belongs to the set of labels {l hier: ire (heen = {lhier, 
contradicting {I?}ier N {I*}je7 = 0 with «Le; 

When P is qz(M +l’v.P + N) and v is not a variable, the contradiction is 
with rule [T-OutT], which can only be applied when the value v is a variable. 

When P is if v then P else Q and v is not a boolean value, the contradiction 
immediately arises with rule [T-IF]. 


In order to prepare for the preservation result we introduce a few lemmas. 
Lemma 1 (Unrestricted weakening). IfI | P andun(T), thenI,a: TF P. 


Proof. The proof goes by mutual induction on the rules for branches and pro- 
cesses, but we first need to show the result for the value typing rules. We need 
to show that if I H v: S and un(R) then T, x: RF v: S. This follows by a simple 
case inspection of the rules [T-UNIT], [T-TRUvE],[T-FALSE],[T-VAR] taking into 
consideration that un(R). For the rule [T-Sus], use the induction hypothesis to 
obtain T,x: Rt v: S and conclude, using [T-SuB], that P,a: RE v: T. 

For the branch and processes typing rules we detail the proof when the last 
rule is [T-OutT]. Using the result for typing values, we obtain [,,a7: Rt v: S, 
and the induction hypothesis for processes leads to I>b,x: R + P. Using the 
un context split property, taking into account that un(R), we conclude that 
Di o D,x: REUvP: UST. 
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For the processes rule [T-INACT], the result is a simple consequence of un(T). 
For the other rules, the result follows by induction hypothesis in processes and 
branches rules, as well as using the value typing result. We detail the proof for 
rule [T-IF]. Using the typing values result, we know that I,,a2: T F «x: bool. By 
induction hypothesis we also obtain that [>,a: T+ P and Iy,2: T F Q. Using 
the un context split property, we conclude I, o T2, x: T F ifv then P else Q. 


Lemma 2 (Preservation for =). IfI- P and P=Q, then + Q. 


Proof. As in Vasconcelos [43, Lemma 7.4] since we share the structural congru- 
ence axioms. 


Lemma 3 (Substitution). If I; / v: T and I2,x:T F P andl =I, ol}, 
then I H Plu/a]. 


Proof. The proof follows by mutual induction on the rules for processes and 
branches. 


Theorem 2 (Preservation). If It P and P > Q, then rF Q. 


Proof. The proof is by rule induction on the reduction, making use of the weaken- 
ing, substitution lemmas, and preservation for structural congruence. We sketch 
the cases for [R-LINLIN] and [R-LINUN]. 

When reduction ends with rule [R-LINLIN], we know that rule [T-REs] in- 
troduces x: X,y: Y with X LY in the context I’. From there, with applications 
of [T-Par] and [T-Cuotce], I = T; o I> o I; and T; H ling(M + l'w.P + M’), 
T> liny(N + z.Q + N’), T3 + R. Furthermore, D, = Ij o T} and lin(T3), 
T! H x: lin ® {M,US.T, M'} and I}, æ:T + l'v.P: US.T. From the [T-OvutT] 
rule, T, H v: S and T4 + P. For the y side, IT} + y: lin&{N,1’U.V, N’} and 
T}, y: Y H l’z.Q: VU. From the [T-In] rule, P,,y: V,z: Ut Q. We also have 
that S = U from the duality of x and y. Using the substitution Lemma 3, 
T.,y: V, T, + Qļv/z]. Using [T-PAR] with the remaining contexts and [T-REs] 
types the conclusion of [R-LINL1N]. 

When reduction ends with rule [R-LINUN], we know that rule [T-REs] in- 
troduces x: X,y: Y with X_LY in the context I’. From there, with applications 
of [T-Par] and [T-Cuoice], P = T; o I> o T; and T; F lina(M + l'v.P + M’), 
Io F uny(N + 1°z.Q + N’), T3 + R. Furthermore, I = I} o IY and lin(I14), 
T! tb z: un {M,L S.T, M'}. Here x is un since x and y are dual. We also have 
T! x: T H U.P: US.T, from which follows T4 + v: S and T; + P from rule 
[T-OurT]. For the y side, T} H y: un&{N, l U.V, N'} and TY, y: YEU?2z.Q: CUV 
which has Ig, y: V,z: U F Q from [T-IN]. 

Types S$ and U are equivalent due to the duality of x, y and so I6, y: V,z: S H 
Q. Using the substitution Lemma 3, Ig o T4,y: V F Q[v/z]. From Is we also type 
the process P. Using [T-PAR] with the remaining contexts and [T-REs], types 
the conclusion of [R-UNLI1N]. 
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5 Classical Sessions Were Mixed All Along 


This section introduces the syntax and semantics of classical session types and 
shows that the language of classical sessions can be embedded in that of mixed 
sessions. 

The syntax and semantics of classical session types are in Figure 9; we follow 
Vasconcelos [43]. The syntax and the rules for the various judgements extend 
those of Figures 1 to 8, where we remove choice both from grammar productions 
(for processes and types) and from the various judgements (operational seman- 
tics, subtyping, duality, and typing). On what concerns the syntax of processes, 
the choice construct of Figure 1 is replaced by new process constructors: output, 
linear (lin) and replicated (un) input, selection (internal choice) and branching 
(external choice). The four reduction axioms in Figure 2 that pertain to choice 
({R-LinLin], [R-LINUN], [R-UNLIn], [R-UNUN]) are replaced by the three ax- 
ioms in Figure 9. Rule [R-LINCoM] describes the output against ephemeral-input 
interaction, rule [R-UNCOoM] the output against replicated-input interaction, 
and rule [R-CASE] selects a label in the menu at the other channel end. 

The syntax of types features new constructs—linear or unrestricted input and 
output, and linear or unrestricted external and internal choice—replacing the 
choice construct in Figure 3. The subtyping rules for the new type constructors 
are taken from Gay and Hole [15]. Type duality is such that the objects of com- 
munication must be equivalent and the continuations (both in communication 
and choice) must be dual again. We omit the dual rules for g!S.S’ | q?T.T’ and 
q&{li: Sibier L q@{li: Ti}ier. The new duality rules are adapted from the co- 
inductive definition of Gay and Hole [15]. The un predicate on types insists on the 
idea that un-annotated types are unrestricted: un(un* S.T) and un(unf{l;: T;}). 
The typing rule for choice in Figure 8 is replaced by the four rules in Figure 9; 
these are taken verbatim from Vasconcelos [43]. 

The embedding of classical session types in mixed sessions is defined in Fig- 
ure 10. It consists of two maps, one for processes, the other for types. These 
maps act as homomorphisms on all process and type constructors not explicitly 
shown. For example [P | Q] = [P] | [Q]. We distinguish one label, msg, and 
use it to encode input and output (both processes and types). Input and output 
processes are encoded in choices with one only msg-labelled branch. The output 
process is qualified as lin (it does not survive reduction) and the input process 
reads its qualifier q from the incoming process. Choice processes in classical ses- 
sions are encoded in choices in mixed sessions. The value transmitted on the 
mixed session is irrelevant: we pick () of type unit for the output side, and a 
fresh variable y; on the input side. Both types are linear. 

Input and output types are translated in choice types. For output we arbi- 
trarily pick an external choice (@), and conversely for the input. The label in the 
only branch is msg in order to match our pick for processes, and the qualifier is 
read from the incoming type. For classical choices, we read the qualifier and the 
view from the incoming type. The type of the communication in the branches of 
the mixed choice is unit, again so that it matches our pick for processes. 

Typing correspondence says that the embedding preserves typability. 
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Classical syntactic forms 


Pos oo: Processes: 
xlv.P output 
qxu?a.P input 
z<al.P selection 
zD {li: PiSier branching 

fae Types: 
qx T.T communication 
gH: Ti}icr choice 


Classical reduction rules, P + P, (plus [R-REs] [R-PAR] [R-STRUCT] from Figure 2) 


(vxy)(alu.P | liny?z.Q | R) > (vry)(P | Q[v/z] | R) [R-LINCoM] 
(vey)(alu.P | uny?z.Q| R) > (vry)(P | Q[v/z] | uny?z.Q | R) [R-UNCoM] 
jel 


(vey)(x < lj.P | y> {li: Qifier | R) > (vxy)(P | Q; | R) [R-CASE] 


Classical subtyping rules, T <: T 
T <:S S <:T" S<:T Sey” 
qQS.S! <: q!T.T’ q? S.S" <: g T.T 
JCI Si <ET ILJ Si <: Ti 
q@f{li: Sibier <: q9{ : Tj jes q&{li: Sifier <: q&{lj : Tj }jes 


Classical type duality rules, T LT 


S=T Ss’ L 7 Si alll Ti 
q? S.S" ak. q@QiT.T’ qd ice) {li : Si pier + q&{l; : Ti hier 


Classical typing rules, I H P 
Kk æg! T.U Ighu:T I3+ta:UtP 


T-T 
Ii o Izo Is F alu.P [ Our 
q(1) o I2) Ili HF gz: q2?T.U (I2+a:U),y: TEP [T-TIN 
Iı o Ia F qa?y.P 
T-B 
Ti olo F zD {h Ever A 
Di Fg: q@{li: Ti jier Iz+2: TFP jel (T-SEL 


T,olgF2<1,.P 


Fig. 9: Classical session types 


Theorem 3 (Typing correspondence). 


1. fIr Fu: T, then [T] + v: [T]. 
2. If CK P, then [I] F [P]. 
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Process translation 
[z!v.P] = lin {msg'v.[P]} 
[ax?y.P] = qga{msg'y.[P]} 
[x < L.P] = linx{l'().[P]} 
[x > {li: Pier] = lin s4} yi- [P]er (yi ¢ tv(Pi)) 
(Homomorphic for 0, P | Q, (vxy)P, and if v then P else Q) 
Type translation 
[4!S.T] = g@{msg'[S].[T]} 
la? S.T] = g&{msg'[5].[7]} 
[a@{li: Ti}ier] = 48 {iuunit. [Ti] }ier 
[a&{li: Ti}ier] = q&{} unit. [T:] jier 


(Homomorphic for end, unit, bool, wa.T, and a) 


Fig. 10: Embedding classical session types 


Proof. 1. A straightforward rule induction on the hypothesis. 

2. By rule induction on the hypothesis. We sketch a few cases. 

When the derivation ends with [T-TIN], we use item 1., induction, the fact 
that q(T, oL2) implies qı [1 - T2], and that (I)+a:T),y:T=(,y:T)+a: S 
because x and y are distinct variables. 

When the derivation ends with [T-BRANCH], we obtain (Im + x: Tj), yi: unit F 
[P;] from the induction hypothesis I> + x: T; + [P;] using weakening (Lemma 1). 

0 


We complete this section by proving that the classical-mixed translation 
meets Gorla’s good encoding criteria [17]. The five criteria proposed by Gorla 
ensure that the encoding is meaningful. There are two syntactical and three 
semantics-related criteria. 

Let C range over classical processes and M range over mixed choice processes. 
The map [|] : C —> M described in Figure 10 is a translation from classical 
processes to mixed choice processes. To be in line with the criteria, we add the 
process v representing a successfully terminating process to the syntax of both 
the source and the target languages. We denote by => the reflexive and transitive 
closure of the reduction relations, — , in both the source and target languages. 
Sometimes we use subscript M to denote the reduction of mixed choice processes 
and the subscript C for the reduction of classical processes, even though it should 
be clear from context. 

We say that a process P does not reduce, P 4, when it cannot make any 
reduction step. We say that a process diverges, P +”, when P can do an infinite 
number of reductions. On the other hand, a process is successful, PJ), if P 
reduces to a process in parallel with a success v, that is, P => P’ | v. Gorla’s 
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criteria view calculi as triples (P, > ,), where P is a set of processes, > a 
reduction relation (the operational semantics), and x is a behavioral equivalence 
on processes. 

The behavioral equivalence x for mixed sessions we use coincides with struc- 
tural congruence =. 

The first criterion states that the translation is compositional. For this pur- 
pose, we define a context C(_1;...;_-%) as a classical process with k holes. 


Theorem 4 (Compositionality). The translation |] :C — M is composi- 
tional, i.e., for every k-ary operator op of M and for every subset N of channel 
ends, there exists a k-ary context Ca ...3-k) such that for all Py,..., Py with 
U% fv(P;) = N and [op(Pi,.--, Px)] = cX (Pal; -3 [AD 


Proof. The translation of a process is defined in terms of the translation of their 
subterms, see Figure 10. 


Following the ideas from Peters et al. [34], the translation from mixed to 
classical sessions can be enriched with a renaming policy pj], representing a 
map from channel ends to sequences of channel ends. The following theorem 
states that the proposed translation is name invariant. 


Theorem 5 (Name invariance). The translation |] : C — M is name 
invariant, i.e., for every classical process P and substitution o, 


[Po] = [P]o ifo is angeceiue 
x [P]o’ otherwise 
where a" is such that py y(o(x)) = o' (pgp 4(2)), for every channel end x. 


Proof. The translation transforms each channel end (x, in Figure 10) into itself. 
Thus, any substitution is preserved. See Figure 10. 


Operational correspondence states that the embedding preserves and reflects 
reduction. In our case the embedding is quite tight: one reduction step in classical 
sessions corresponds to one reduction step in mixed sessions. There is no runtime 
penalty in running classical sessions on a mixed sessions machine. Further notice 
that we do not rely on any equivalence relation on mixed sessions to establish 
the result: mixed-sessions images leave no “junk” in the process of simulating 
classical sessions. 


Theorem 6 (Operational correspondence). Let P,P’ be classical sessions 
processes and Q a mixed sessions process. 


1. If P > P', then |P] > [P’J. 
2. If [P] > Q, then P > P’ and [P’] = Q, for some P’. 


Proof. Straightforward rule induction on the hypotheses, relying on the fact that 
[P][v/z] = [Plv/2z]] and z; ¢ fv(P;) in the translation of z > {l;: Pi pier. 
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The following theorems concern the finite and infinite behavior of classical 
session processes and their corresponding translations. 


Theorem 7 (Divergence Reflection). The translation |-] : C — M reflects 
divergence, i.e., if |P] >$4 then P -@ for every process P € C. 


Proof. Corollary of Theorem 6. oO 


Theorem 8 (Success Sensitivity). The translation [-] : C — M is success 
sensitive, i.e., Plc iff [P] mu, for every process P € C. 


Proof. Corollary of Theorem 6. oO 


6 What is in the Way of a Compiler? 


This section discusses algorithmic type checking and the implementation of 
choice in message passing architectures. 

We start with type checking and then move to the runtime system. Gay and 
Hole present an algorithmic subtyping system for classical sessions [15]. Algo- 
rithmic subtyping for mixed sessions can be obtained by adapting the rules in 
Figure 4 along the lines of Gay and Hole. [T-SuB] is the only non syntax-directed 
rule in Figure 8.We delete this rule and distribute subtype checking among all 
rules that use, in their premises, sequents I’ H v: T, as usual. Most of the rules 
include a non-deterministic context split operation. Take rule [T-PAR], for ex- 
ample. Rather than guessing the right split, we take the incoming context and 
give it all to process P, later reclaiming the unused part. This outgoing context 
is then passed to process Q. The outgoing context of the parallel composition 
P | Q is that of Q. See, e.g., Vasconcelos or Walker for details [43,48]. Rule 
[T-REs] requires guessing the type of the two channel ends, so that one is dual 
to the other. Rather than guessing the type of channel end x, we require the help 
of the programmer by working with an explicitly typed syntax—(vy : T)P—as 
in Franco and Vasconcelos [12,43], where T refers to the type of channel end zx. 
For the type of channel end y, rather than guessing, we build it from type T; 
cf. [4,5,7,25]. 

Running mixed sessions on a message passing architecture need not be an 
expensive operation. Take one of the communication axioms in Figure 2. We 
set up a broker process that receives the label-polarity pairs of both processes 
({UF fier and {l} }jez), decides on a matching pair (guaranteed to exist for typed 
processes), and communicates the result back to the two processes. The processes 
then exchange the appropriate value, and proceed. If the broker is an independent 
process, then we exchange five messages per choice synchronisation. This basic 
broker is instantiated for two processes P £ lin e(z. P} + lhv2.P, +1303.P3) and 
Q Ê liny(l}v1.Qı + w.Q3) in Figure 11a. 

We can do better by piggybacking the values in the output choices together 
with the label-polarities pairs. The broker passes its decision to the input side 
in the form of a triple label-polarity-value, yielding one less message exchanged, 
as showcased in Figure 11b. 
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Broker Broker 


? ! ! ? 
Uy tly +l; I +1, ve4l3 v3 Uo H5 


(a) Basic broker (b) Values are piggybacked 


Fig. 11: Broker is an independent process 


nv tis Hitlnegtius 
P/B aa o>? Poo o oa Q/B 
u Ivi 
(a) P is the broker (b) Q is the broker 


Fig. 12: Broker is P or Q 


Finally, we observe that the broker need not be an independent process; it can 
be located at one of the choice processes. This reduces the number of messages 
down to two messages in the general case, as described in Figures 12a and 12b 
where either P is the broker or Q is the broker. Even if the value was already 
sent by Q in the case that P is the broker, P must still let Q know which choice 
was taken, so that Q may proceed with the appropriate branch. 


However, in particular cases one message may be enough. Take, for instance 
a process P £ un g(l}v1.P' + l,v2.P’). Independently of which branch is taken, 
the process proceeds as P’. Thus, if the broker is located in a process Q, then 
P needs not be informed of the selected choice. The same is true for classical 
sessions where selection is a mixed-out choice of a single branch. 


There are two other aspects that one should discuss when implementing 
mixed sessions on a message passing architecture other than the number of 
messages exchanged. 


The first is related the type of broker used and to which values are revealed in 
a choice to the other party. In the case of the basic broker, only the chosen option 
value is revealed, and never to the broker itself. However, when we piggyback 
the values in the second type of broker, all values in the choice branches are 
revealed to the broker, even if they are not used in the end. This is even more 
striking in the case where one of the processes is the broker—the other party 
has access to all the possible values, independently of the choice that is taken. 
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The second aspect is also related to the values themselves which, in order to 
be presented in the choice, values must be computed a priori, even if they are 
not used in the choice. 

When dealing with the privacy of the values, we can choose which type of 
broker to use depending on how much we want to reveal to the other party. 
However, to prevent computing before a branch is chosen, one should instead 
use classical sessions. 


7 Related Work 


The origin of choice Free (completely unrestricted) choice is central to process 
algebras, including BPA and CCS [3,26]. Here we usually find processes of the 
form P + Q, where P and Q are arbitrary process. Free choice is also present in 
the very first proposal of the z-calculus [30,31], even if Milner later uses guarded 
choice [28]. Sangiorgi and Walker’s book builds on the pi-calculus with guarded 
(mixed) choice [38]. Guarded choices in all preceding proposals operate on possi- 
bly distinct channels—z!true. P + y?z.Q— whereas choices on mixed sessions run 
on a common channel—z(I!true.P + m?y.Q). Kouzapas and Yoshida introduce 
the notion of mixed session in the context of multiparty session types [24]. Mul- 
tiparty session types are projected into binary session types, hence the authors 
also consider mixed choices for binary sessions. This language is not as concise 
as the one we present, probably because it is designed so as to match projection 
from multiparty types. 

Labelled-choices were embedded in the theory of session types by Honda 
et al. [18,19,41], where one finds primitives for value passing—z!true.P and 
x?y.Q—and, separately, for choice in the form of labelled selection—az < 1.P— 
and branching—z > {l;: P;}ie;—see Section 5. Coalescing label selection with 
output and branching with input was proposed by Vasconcelos [44] (and later 
used by Sangiorgi [37]) as a means to describe concurrent objects. Demangeon 
and Honda use a similar language to study embeddings of calculi for functions 
and for session-based communication [9]. All these languages offer only separated 
(unmixed) choices and only on the input side. 


Mixed choices in the Singularity operating system Concrete syntax apart, the 
language of linear mixed choices is quite similar to that of channel contracts in 
Sing# [10]. Rather than explicit recursive types, Sing# contracts uses named 
states (akin to typestates [40]), providing for more legible contracts. In Sing#, 
each state in a contract corresponds to a mixed session lin&{1¥.5;.T;} (contracts 
are always written from the consumer side) where each l; denotes a message tag, 
x the message direction (! or ?), S; the type of the value in the message, and T; 
the next state. 

Stengel and Bultan showed that processes that follow Sing# contracts can 
engage in communication errors [39]. They further provide a realizability condi- 
tion for contracts that essentially rules out mixed choices. Bono and Padovani 
present a calculus and a type system that models Sing# [6,7]. The type system 
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ensures that well-typed processes are exempt from communication errors, but the 
language of types excludes mixed-choices. So it seems that Sing#-like languages 
only function properly under separated choice, yet our work survives under mixed 
choices. Contradiction? No! Sing# features asynchronous (or buffered) seman- 
tics whereas mixed sessions run under synchronous semantics. The operational 
semantics makes all the difference in this case. 


Synchronicity, asynchronicity, and choice Pierce and Turner identified the prob- 
lem: “In an asynchronous language guarded choice should be restricted still fur- 
ther since an asynchronous output in a choice is sensitive to buffering” [36] and 
Peters et al. state that “a discussion on synchrony versus asynchrony cannot 
be separated from a discussion on choice” [34,35]. Based on classical sessions, 
mixed sessions are naturally synchronous. The naive introduction of an asyn- 
chronous semantics would ruin the main results of the language (see Section 4). 
Asynchronous semantics are known to be compatible with classical sessions; 
see Honda et al. [20,21] for multiparty asynchronous session types and Fowler 
et al. [11] and Gay and Vasconcelos [16] for two examples of functional lan- 
guages with session types and asynchronous semantics. So one can ask whether 
a language can be designed where mixed-choices are handled synchronously and 
separated-choices asynchronously, a type-guided operational semantics with by- 
default asynchronous semantics, reverting to a synchronous semantics when in 
presence of mixed-choices. 


Separation results Palamidessi shows that the a-calculus with mixed choice is 
more expressive than its subset with separated choice [32]. Gorla provides a 
simpler proof [17] of the same result and Peters and Nestmann analyse the 
problem from the perspective of breaking initial symmetries in separated-choice 
processes [33]. Unlike the z-calculus with separated choices, mixed choices oper- 
ate on the same channel and are guided by types. It would be interesting to look 
into separation results for classical sessions and mixed sessions. Are mixed ses- 
sions more expressive than classical session under some widely accepted criteria 
(those of Gorla [17], for example)? 


The origin of mized sessions Mixed sessions dawned on us when looking into 
an algorithm to decide the equivalence of context-free session types [1,42]. The 
algorithm translates types into (simple) context-free grammars. The decision 
procedure runs on arbitrary simple grammars: the right-hand sides of grammar 
productions may start with a label-output or a label-input pair for the same 
non-terminal symbol at the left of the production. We then decided to explore 
mixed sessions and picked the simplest possible language for the effect: the 7- 
calculus. It would be interesting to look into mixed context-free session types, 
given that decidability of type equivalence is guaranteed. 
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8 Conclusion 


We introduce mixed sessions: session types with mixed choice. Classical session 
types feature separated choice; in fact all the proposals in the literature we are 
aware of provide for choice on the input side only, even if we can easily think 
of choice on the output side. Mixed sessions increase flexibility in programming 
and are easily realisable in conventional message passing architectures. 

Mixed choices come with a type system featuring subtyping. Typability is 
preserved by reduction. Furthermore well-typed programs are exempt from run- 
time errors. We provide suggestions on how to derive a type checking procedure, 
even if we do not formalise it. Classical session types are a particular case of 
mixed sessions: we provide for an encoding and show typing and operational 
correspondences. 

We leave open the problem of looking into a typed separation result (or a 
proof of inseparability) between classical sessions and mixed sessions. An inter- 
esting avenue for further development includes looking for a hybrid type-guided 
semantics, asynchcronous by default, that reverts to synchronous when in pres- 
ence of an output choice. 
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Abstract. We develop a theory for two recently-proposed spreadsheet 
mechanisms: gridlets allow for abstraction and reuse in spreadsheets, and 
build on spilled arrays, where an array value spills out of one cell into 
nearby cells. We present the first formal calculus of spreadsheets with 
spilled arrays. Since spilled arrays may collide, the semantics of spilling 
is an iterative process to determine which arrays spill successfully and 
which do not. Our first theorem is that this process converges determin- 
istically. To model gridlets, we propose the grid calculus, a higher-order 
extension of our calculus of spilled arrays with primitives to treat spread- 
sheets as values. We define a semantics of gridlets as formulas in the grid 
calculus. Our second theorem shows the correctness of a remarkably di- 
rect encoding of the Abadi and Cardelli object calculus into the grid cal- 
culus. This result is the first rigorous analogy between spreadsheets and 
objects; it substantiates the intuition that gridlets are an object-oriented 
counterpart to functional programming extensions to spreadsheets, such 
as sheet-defined functions. 


1 Introduction 


Many spreadsheets contain repeated regions that share the same formatting and 
formulas, perhaps with minor variations. The typical method for generating each 
variation is to apply the operations copy-paste-modify. That is, the user copies 
the region they intend to repeat, pastes it into a new location, and makes local 
modifications to the newly pasted region such as altering data values, format- 
ting, or formulas. A common problem associated with copy-paste-modify is that 
updates to a source region will not propagate to a modified copy. A user must 
modify each copy manually—a process that is tedious and error-prone. 

Gridlets [12] are a high-level abstraction for re-use in spreadsheets based on 
the principle of live copy-paste-modify: a pasted region of a spreadsheet can be 
locally modified without severing the link to the source region. Changes to the 
source region propagate to the copy. 
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The central idea of this paper is that we can implement gridlets using a 
formula operator G. If a cell a contains the formula 


G(r, a1, Fi, twa sün Ln) 


then the behaviour is to copy range r, modify cells a; with formulas F;, and 
paste the computed array in cell a where its elements may be displayed in the 
cells below and to the right. 

Consider the following example: 


A B Cc A B C 
1 |“Edge” “Len.” 1 |“Edge” | “Len.” 
» “a” 3 =B272 2 “a” 3 9 
3 “bP 4 =B3*2 3 “pr 4 16 
4 “e |=SQRT(C4)|=C2 + C3 4 sg 5 25 
Source sheet Evaluated sheet 


The table computes and displays a Pythagorean triple, with intermediate cal- 
culation spread across many cells. To reuse the table a user creates a gridlet by 
inserting? a G formula in cell A6 as follows. 


A B C A B C 
6 |=G(A1:C4, B2, 7, B3, 24) 6 | “Edge” | “Len.” 
Tf 7 “a” T 49 
8 8 sg: 24 576 
9 9 son 25 625 
Source sheet Evaluated sheet 


The formula in A6 is interpreted as: compute the source range A1:C4 with B2 
bound to 7, and B3 bound to 24. The result of the formula is an array corre- 
sponding to the computed range which then displays in the grid, emulating a 
paste action. A consequence of this design is that this single formula controls 
the content of a range of cells, below and to the right; we say that it spills into 
these cells. 

Our overall goal is to explain the semantics of the gridlet operator G using ar- 
ray spilling. Spilling is not new in spreadsheets: both Microsoft Excel and Google 
Sheets allow a cell to contain a formula that computes an array, and whose com- 
puted value then spills into vacant cells below and to the right. While there is a 
practical precedent for spilling in spreadsheets, there is no corresponding formal 
precedent from which to derive a semantics for G. This paper therefore proceeds 
in two parts. 


5 The user may enter this formula either directly, or indirectly via some grid-based 
interface |12]; details of the user experience are beyond the scope of this paper. 
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First, we make sense of array spilling and its subtleties. Two formulas spilling 
into the same cell, or colliding, is one problem. Another problem is a formula 
spilling into an area on which it depends, triggering a spill cycle. Both problems 
make preserving determinism and acyclicity of spreadsheet evaluation a chal- 
lenge. We give a semantics of spilling that exploits iteration to determine which 
arrays spill successfully, and which do not. Our solution ensures that there is at 
most one array that spills into any address, and that the iteration converges. 

Second, we develop three new spreadsheet primitives that implement G when 
paired with spilled arrays. We present a higher-order spreadsheet calculus, the 
grid calculus, that admits sheets as first-class values and provides operations 
that manipulate sheet-values. Previous work has drawn connections between 
spreadsheets and object-oriented programming [5,8,9,15,17], but we give the first 
direct correspondence by showing that the Abadi and Cardelli object calculus [1] 
can be embedded in the grid calculus. Our translation constitutes a precise 
analogy between objects and sheets, and between methods and cells. 

In our semantics for gridlets, we make three distinct technical contributions: 


— We develop the spill calculus, the first formalisation of spilled arrays for 
spreadsheets. Our first theorem is that the iterative process of spilling we 
present converges deterministically (Section 4). Our formal analysis of spilled 
arrays, a feature now available in commercial spreadsheet systems, is a sub- 
stantial contribution of this work, independent of our gridlet semantics. 

— We develop the grid calculus, an extension of the spill calculus with three 
higher-order operators: GRID, VIEW, and UPDATE. These correspond to 
copy, paste, and modify, and suffice to encode the operator G (Section 5). 

— In the course of developing the grid calculus, we realised a close connection 
between gridlets and object-oriented programming. We make this precise by 
encoding the Abadi and Cardelli object calculus into the grid calculus. Our 
second theorem shows the correctness of this encoding (Section 6). 


2 Challenges of Spilling 


In this section we describe the challenges of implementing spilled arrays. We de- 
scribe core design principles for spreadsheet implementations and then illustrate 
how spilled arrays challenge these principles. 


2.1 Design Principles for Spreadsheet Evaluation 


Spreadsheet implementations rely on the following two properties to be pre- 
dictable and efficient. 


Determinism Evaluation should produce identical output given identical in- 
put; this property is exploited for efficient recalculation. 

Acyclicity Evaluation should not be self-referential. The dependency graph of 
a spreadsheet should form a directed acyclic graph and no cell should depend 
on its own value. Creating self-referential formulas cannot be prevented, but 
violations of acyclicity should be observable and not cause divergence. 
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Both properties are satisfied by standard spreadsheet implementations, if we 
exclude a few nondeterministic worksheet functions such as RAND. Through- 
out this work we consider only deterministic worksheet functions. Given this 
assumption, spreadsheet formulas constitute a purely functional language, and 
so evaluation is deterministic. Cell evaluation tracks a calculating state for every 
cell and raises a circularity violation for any cell that depends on its own value. 

Spilled arrays pose a challenge for preserving determinism and acyclicity 
which we illustrate with examples. For the remainder of our technical develop- 
ments we drop the leading = from formulas. We begin with core terminology. 


Arrays Spreadsheet arrays are finite two-dimensional matrices that use one- 
based indexing and are non-empty. We denote an (m,n) array literal as 


{Vi 1, T sa Views say Vm,1, P iVn 


where (,) delimits the n columns and (;) delimits the m rows. We use V to 
range over values, which are described in Section 3. 

Spilling Address a, (i,7)-spills into address a, iff the value of ay is an (m,n) 
array and a; is 7— 1 rows below and j — 1 columns right of ap, where i € 1..m 
and j € 1..n. In particular, a, (1,1)-spills into itself. 

Roots, targets, & areas If a, (i, j)-spills into address a, we call a, the spill 
root and a; a spill target. The spill area of a, is the set of its spill targets. 
The value of a; is element (i, 7) of the array that is the value of a,. 


Consider the following example: 


A B A B 

1 |{10, 20} il 10 20 
2 

Source Sheet Evaluated Sheet 


Address Al evaluates to a (1,2) array and is a spill root with spill area {A1, B1}. 
Address A1 (1,1)-spills into A1, and (1, 2)-spills into B1. 


2.2 Spill Collisions 


Spill collisions can be static or dynamic, and may interfere with determinism. 


Static Collision Every cell in a spill area should be blank except for the spill 
root; a blank cell has no formula. A static collision occurs when a spill root spills 
into another non-blank cell, and we say the non-blank cell is an obstruction. 
The choice to read the value from the obstruction or the spilled value violates 
determinism. We adopt a simple mechanism used by Excel and Sheets to resolve 
static spill collisions: the root evaluates to an error value, not an array, and spills 
nowhere. The ambiguity between reading the obstructing cell’s value and the 
root’s spilled value is resolved by preventing the root from spilling—we always 
read the value from the obstructing cell. Consider the following example: 
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A B A B 

1 |{10,20}) 40 1 ERR 40 
2 B1 +2 2 42 
Source Sheet Evaluated Sheet 


The address B1 obstructs spill root A1 and consequently address A1 evaluates 
to an error value, address B1 evaluates to 40, and address B2 evaluates to 42. 


Dynamic Collisions A dynamic collision occurs when a blank cell is a spill target 
for two distinct spill roots. Dynamic collisions can be resolved in different ways. 


— The conservative approach is to say no colliding spill root spills and each 
root evaluates to an error. 

— The liberal approach is to say that every colliding spill root spills. This 
approach can be non-deterministic because the spill target obtains its value 
by choosing one of the multiple colliding spill roots. Google Sheets takes the 
liberal approach. 

— An intermediate approach enforces what we call the single-spill policy. One 
root from the set of colliding roots is permitted to spill and the rest evaluate 
to an error. This approach can be non-deterministic because there is a choice 
of which root is permitted to spill. Excel takes the single-spill approach. 


Consider the following example that uses the single-spill approach: 


A B A B A B 

1 | B2 | {3;4} 1 2 ERR 1 4 3 
2 | {1,2} 2 1 2 2 | ERR 4 
Source Sheet Root A2 wins Root B1 wins 


Addresses A2 and B1 are spill roots: the former evaluates to an array of size 
(1,2) while the latter evaluates to an array of size (2,1). The value of address Al 
depends on which address from the colliding spill roots A2 and B1 are permitted 
to spill. Arbitrarily selecting which root is permitted to spill violates determinis- 
tic evaluation. Sheets and Excel resolve collisions using an ordering that prefers 
newer formulas. While consecutive evaluations of the same spreadsheet will pro- 
duce the same result, two syntactically identical spreadsheets constructed in 
different ways can produce different results. In Section 4 we give a deterministic 
semantics for spilling that uses a total ordering on addresses to select a single 
root from a set of colliding roots. 


2.3 Spill Cycles 


A spill cycle occurs when the value of a spill root depends on an address in its 
spill area. Spill cycles violate acyclicity and subtly differ from cell cycles. A cell 
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cycle occurs when the value of a formula in a cell depends on the value of the 
cell itself. We know that it is never legal for a cell to read its own value and 
therefore it is possible to eagerly detect cell cycles during evaluation of a cell. In 
contrast, a spill cycle only occurs if the cell evaluates to an array that is spilled 
into a range the cell depends on, so it is not possible to detect the cycle until 
the cell has been evaluated. 

We can thus proactively detect cell cycles, but only retroactively detect spill 
cycles. To see why, let us consider the following example, wherein we assume 
the definition of a conditional operator IF that is lazy in the second and third 
arguments, and the function INC that maps over an array and increments every 
number and converts € to 0, where e is the value read from a blank cell. 


A B 
1 | 42 /IF(Al = 42, SUM(B2:B3), INC(B2:B3)) 
2 
3 


The evaluation of address B1 returns the sum of the range B2:B3. While the 
value of B1 depends on the values in the range B2:B3, the sum returns a scalar 
and therefore no spilling is required. 

Consider the case where the value in Al is changed to 43. The address B1 
will evaluate the formula INC(B2:B3), first by dereferencing the range B2:B3 
to yield {e;¢}, and then by applying INC to yield {0;0}. The array {0;0} will 
attempt to spill into the range B1:B2—a range just read from by the formula. 
The attempt to spill will induce a spill cycle; there is no consistent value that 
can be assigned to the addresses B1, B2, and B3. 

In Section 4 we give a semantics for spilling that uses dynamic dependency 
tracking to ensure that no spill root depends on its own spill area. 


3 Core Calculus for Spreadsheets 


In this section we present a core calculus for spreadsheets that serves as the 
foundation of our technical developments. 


3.1 Syntax 


Figure 1 presents the syntax of the core calculus. Let a and b range over Al-style 
addresses, written Nm, composed from a column name N and row index m. A 
column name is a base-26 numeral written using the symbols A..Z. A row index 
is a decimal numeral written as usual. Let m and n range over positive natural 
numbers which we typically use to denote row or array indices. We assume a 
locale in which rows are numbered from top to bottom, and columns from left to 
right, so that A1 is the top-left cell of the sheet. We use the terms address and cell 
interchangeably. Let r range over ranges that are pairs of addresses that denote 
a rectangular region of a grid. Modern spreadsheet systems do not restrict which 


Higher-Order Spreadsheets with Spilled Arrays 749 


Al-style column name N :=A|...|Z|AA|AB|... 
mne Ni 

Address a,b := Nm 

Range r n= a1:a2 


Value Voos=e|/c|ERR| {Voom} 

Formula F =V |r|f(Fi,..., Fn) (f function name) 
Sheet S n= ju; BS) (a; distinct and no F; = €) 
Grid y u= lav V's] (a; distinct) 


Fig. 1. Syntax for Core Calculus 


corners of a rectangle are denoted by a range but will automatically normalise the 
range to represent the top-left and bottom-right corners. We implicitly assume 
that all ranges are written in the normalised form such that range B1:A2 does 
not occur; instead, the range is denoted A1:B2. 

A value V is either the blank value €, a constant c, an error ERR, or a 
two-dimensional array {Vj;j;’S1-"™I5!-"}. We write {V; jS} Eln} as short 
for array literal {V1,1,.--, Vin; <- <; Vm,1; -< -3 Vm,n}- 

Let F range over formulas. A formula is either a value V, a range r, or a 
function application f(Fı,..., Fn), where f ranges over names of pre-defined 
worksheet functions such as SUM or PRODUCT. 

Let S range over sheets, where a sheet is a partial function from addresses 
to formulas that has finite domain. We write [| to denote the empty map, and 
we write Sja +> F] to denote the extension of S to map address a to formula 
F, potentially shadowing an existing mapping. We do not model the maximum 
numbers of rows or columns imposed by some implementations. Each finite S 
represents an unbounded sheet that is almost everywhere blank: we say a cell a 
is blank to mean that a is not in the domain of S. 

Let y range over grids, where a grid is a partial function from addresses to 
values that has finite domain. A grid can be viewed as a function that assigns 
values to addresses, obtained by evaluating a sheet. 


3.2 Operational Semantics 


Figure 2 presents the operational semantics of the core calculus. Auxiliary defi- 
nitions are present at the top of Figure 2. 


Formula Evaluation The relation S + F |} V means that in sheet S, formula 
F evaluates to value V. A value V evaluates to itself. A function application 
f(Fi,.--,; Fn) evaluates to V if the result of applying [f] to evaluated arguments 
is V, where [f] is the underlying semantics of f, a total function on values. A 
single cell range a:a evaluates to V if address a dereferences to V. A multiple 
cell range a;:a2 evaluates to an array of the same dimensions, where each value 
in the array is obtained by dereferencing the corresponding single cell within the 
range. We write size(a,:a2) to denote the operation that returns the dimensions 
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size(Nim1:Name2) = (m2 — mı +1, N2 — Ni +1) 
Nm + (i,j) (N +5 —-1)(m+i-1) 


Formula evaluation: St FYV 


SEF UV WIV., Va) =V SHa!V 
SFV V SE f(Fi,...,Fa) VV SFa:alV 


a, F a2 size(a1:a2) = (m,n) Vi € 1.m,j € ln. SH (a1 + (i, j))! Viz 
SF a1:a2 J} [V ees 


Address dereferencing: S F a! V 


Sla) = F SFHFYV a ¢ dom(S) 
StalV State 


Sheet evaluation: S |) 7 


S |. y Č Va € dom(S). St a! x(a) 


Fig. 2. Operational Semantics for Core Calculus 


of a range written (m,n), where m is the number of rows, and n is the number of 
columns. We write a+ (i, j) to denote the address offset to the right and below a 
by i— 1 rows and j — 1 columns. For example, a+ (1,1) maps to a, and a+ (1, 2) 
maps to the address immediately to the right of a. Both size(a,:a2) and a+ (i, j) 
are defined in Figure 2. 


Address Dereferencing The relation S F a!V means that in sheet S, address a 
dereferences to V. If address a maps to formula F in sheet S, then dereferencing 
a returns V when F evaluates to V. If address a is not in the domain of S then 
dereference a returns the blank value e. We make range evaluation and address 
dereferencing distinct relations to aid our presentation in Section 4. 


Sheet Evaluation The relation S |} y means that sheet S evaluates to grid 7 
and the relation is defined by point-wise dereferencing of every address in the 
sheet. Recall the spreadsheet design principles of determinism and acyclicity 
from Section 2.1. The relations of our semantics are partial functions (as stated 
in Appendix A of the extended version [21]). As for acyclicity, if there is a cycle 
where S(a) = F and evaluation of formula F must dereference cell a, then we 
cannot derive SH F |} V for any V. Although our calculus could be modified to 
model a detection mechanism for cell cycles, we omit any such mechanism for 
the sake of simplicity. 
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Formula F :=.---|a# (postfix operator) 
Dependency set D ::= {a1,...,an} 

Grid y t= (ag m (V, Vi, Dy] (a; distinct) 
Spill permit pu=Vv|x 

Spill oracle w = [ar (Mi, ni, pi) E1] (a; distinct) 


Fig. 3. Syntax for Spill Calculus (Extends and modifies Figure 1) 


4 Spill Calculus: Core Calculus with Spilled Arrays 


The spill calculus, presented in this section, is the first formalism to explain the 
semantics of arrays that spill out of cells in spreadsheets. The spill calculus and 
its convergence, Theorem 1, is our first main technical contribution. 


4.1 Syntax 


Figure 3 presents the extensions and modifications to the syntax of Figure 1; we 
omit syntax classes that remain unchanged. 

Let F range over formulas, extended to include the postfix root operator a#. 
The root operator a# evaluates to an array if address a is a spill root. Accessing 
an array via the root operator instead of a fixed-size range is more robust to 
future edits. For example, consider the sheet [Al ++ F,B1 => SUM(A1:A10)] 
where formula F evaluates to a (10,1) array. If the user modifies F such that 
the formula evaluates to an array of size (11,1) then the summation in B1 still 
applies only to the first ten elements that spill from A1, even if the user intends 
to sum the whole array. The root operator allows a more robust formulation: 
[Al +> F,B1 + SUM(A1#)]. The summation in B1 applies to the entire array 
that spills from A1, regardless of its size. Section 4.3 shows the full semantics of 
the root operator. 

Let D range over dependency sets, which denote a set of addresses that a 
formula bound to an address depends on. 

Let y range over grids, which now map addresses to tuples of the form 
(V#,V',D). If y(a) = (V*,V',D) then V* is the pre-spill value obtained by 
applying the root operator # to a, while V' is the post-spill value obtained 
by evaluating a, and D is the dependency set required to dereference a. Each 
dereferenced address has both a pre-spill and post-spill value, even if the cell 
content does not spill. If the pre-spill value is not an array, it cannot spill, and 
the post-spill value equals the pre-spill value. 

Let p range over spill permits, where V denotes that a root is permitted to 
spill and x denotes that it is not. 

Let w range over spill oracles, which map addresses to tuples of the form 
(m,n,p). A spill oracle governs how arrays spill in a sheet. 


— If w(a) = (m,n, p) we expect a to be a spill root for an (m,n) array: 
— If p= v the contents of a can spill with no obstruction. 
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Let S © [A1 +> {7;8}, B14 IF(A2 = 8, {9; 10}, 100)] 


A B A B A B 
Il {7:8} | 100 T| 7 | {9;10} m7 9 
8 2 8 10 

Round 1: w = |] Round 2: Round 3: w3 = [A1 > 


w2 = [A1 > (2,1, V)] (2,1, Vv), Bl > (2,1, v”)} 


Fig. 4. Example Spill Iteration 


— If p= x then a cannot spill because either a formula obstructs the spill 
area, or another spill root will spill into the area. 


Oracles track the size of each spilled array so we can find the spill root a of any 
spill target, and hence obtain the value for a spill target by dereferencing a. 


4.2 Spill Oracles and Iteration 


As discussed in Section 2.2, spill collisions have the potential to introduce non- 
determinism if not handled appropriately. Our solution is to evaluate a sheet in a 
series of rounds, each determined by a spill oracle. Given a sheet, a grid is induced 
by evaluating the sheet and using the oracle to deterministically predict how 
each root spills. A discrepancy could be a new spill root the oracle missed, or an 
existing spill root with dimensions differing from the oracle. If any discrepancies 
are found we compute a new oracle, and start a new round. Iteration halts when 
the oracle is consistent with the induced grid. The notion of a consistent oracle 
is defined in Section 4.4. We can view the iteration as a sequence of n oracles 
where only the final oracle is consistent: 


[| = wy — wo —>+-+ — wn and wn is consistent 


Consider the example in Figure 4. At the top we show the bindings of the sheet; 
at the bottom we show the oracle and induced grid for each round of spilling. 
We define the initial spill oracle as w; = |] and in the first round the oracle 
is empty. An empty oracle anticipates no spill roots and therefore no roots are 
permitted to spill. The array in Al remains collapsed and B1 evaluates using the 
false branch. Once the sheet has been fully evaluated we determine that w was 
not a consistent prediction because there is an array in Al with no corresponding 
entry in w1. We compute a new oracle that determines that A1 is allowed to spill 
because the area is blank. We define the new oracle as w2 = [Al > (2,1,v)]. 
In the second round the root Al is permitted to spill by the oracle and as a 
consequence B1 now evaluates to the array {9; 10}—this array is not anticipated 
by the oracle and remains collapsed. Once the sheet has been fully evaluated we 
determine that w2 was not a consistent prediction because there is an array in 
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B1 with no corresponding entry in w2. We compute a new oracle that determines 
that B1 is allowed to spill because the area is blank in the grid induced by we. 
We define the third oracle as w3 = [Al > (2,1, v ),B1 > (2,1, v)]. 

In the third and final round the root Al is permitted to spill by the oracle 
and B1 evaluates to the array {9; 10}. This time the oracle anticipates the root 
in B1 and permits the array to spill. Once the sheet has been fully evaluated we 
determine that w3 is a consistent prediction because the spill roots A1 and B1 
are contained in the oracle. The iteration is the sequence of three oracles: 


|] — [A1 > (2,1, v)] — [A1 > (2,1,Vv), B14 (2,1,V/)] 


Spill Rejection Spill oracles explicitly track the anticipated size of the array 
to ensure that spill rejections based on incorrect dimensions can be corrected. 


Consider the following example: 
A B C 
1 IF(C2 = 2, {10; 20}, {10; 20; 30})) {1;2} 
3 | {1,2,3} 


After the first round using an empty spill oracle there are three spill roots: 
A3 = {1,2,3}, B1 = {10; 20;30}, and C1 = {1;2}. There is sufficient space to 
spill C1 but only space to spill one of A3 and B1; the decision is resolved using 
the total ordering on addresses. Suppose that we allow A3 to spill such that the 
new oracle is: [A3 +> (1,3, v), B1 > (3,1, x), C1 > (2,1, v)]. 

After the second round we find that address B1 returns an array of a smaller 
size because the root C1 spills into C2. Previously we thought B1 was too big to 
spill but with the new oracle we find there is now sufficient room; by explicitly 
recording the anticipated size it is possible to identify cases that require further 
refinement. We compute the new oracle [A3 > (1,3, v), Bl > (2,1, v),C1 => 
(2,1, v )] that is consistent. 

An interesting limitation arises if the total ordering places B1 before A3, 
which we discuss in Section 4.6. 


4.3 Operational Semantics 


Figure 5 presents the operational semantics for the spill calculus. The key ad- 
ditions to the relations for formula evaluation and address dereferencing are an 
oracle w that is part of the context, and a dependency set D that is part of the 
output. We discuss each relation in turn and focus on the extensions and modi- 
fications from Figure 2. Auxiliary definitions are present at the top of Figure 5. 


Formula Evaluation: SwF F 4 V,D The spill oracle w is not inspected by the 
relation but is threaded through the definition. Dependency set D denotes the 
transitive dependencies required to evaluate F. Evaluating a value or function 
application is as before, except we additionally compute the dependencies of the 
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owners(w,a) = {(ar,i, j) |w(ar) = (m,n, Vv) and ar + (i,j) = a and (i,j) < (m, n)} 
area(a,m,n) = {a + (i, j) | Vi € 1..m,Yj € 1..n } 

z — - iEl..m,jEl..n 
size(V) = i n) if V = {Vig } 


L otherwise 


Formula evaluation: SwF F 4 V, D 


S,w H Fi 4 Vi, Di [f]MY,.-.,Vn) = V 


SwF V YV, Ø ý 
v S, wH (Fi... , Fa) 4 V, |] Di 

i=l 
S,wkalV*,v',D S,wkalV*,v',D 

S,wk a# 4 VË, DU {a} S,wka:al V',DU {a} 
a1 # ag 
size(a1:a2) = (m,n) Viel.m,j Elin. S,wk ai + (i,j)! VW, Vij, Dij 
S,w H a1:09 4 fy, e renni, U Dij Ufa + (i, j)} 
i,j=1,1 


Address dereferencing: S,w H a! VŽ, V‘, D 


owners(w, a) = Ø a ¢ dom(w) S(a)=F SwF FYV, D 
S,wka!lV,V,D 


(1) 


owners(w, a) = Ø a ¢ dom(w) a ¢ dom(S) 
SwF a!e, €, Ø 


owners(w, a) = Ø w(a) = (m,n, x) S(a) =F SwF FYV, D 


(3) 
S,w F a! V, ERR, D 
(ar, i, j) € owners(w, a) w(ar) = (m,n, v) S(ar) = F 
S,w\ar FYV, D size(V) = (m, n) area(ar, m, n) N D = Ø (4) 


Siwk al (a= an ?V: €), Vig, D 


(ar, i, j) E owners(w, a) 
w(ar) = (m,n, Vv) S(ar) = F S,w\ar,- FYV, D size(V) # (m,n) 
S,wka!(a=a,?V:6),(a=ar?V:6),(a=a,?D: Ø) 


(5) 


Sheet evaluation: S,w |) y 


def 


S,wly = Va € dom(S). SwF a! y(a) 


Fig. 5. Operational Semantics for Spill Calculus 
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formula. The dependency set required to evaluate a value is @. The dependency 
set required to evaluate a function application is the union of the dependencies 
of the arguments. Evaluating a root operation a# dereferences a and returns the 
pre-spill value V*. The dependency set required to evaluate a root operation a# 
is the dependency set required to dereference a and the address a itself. Evaluat- 
ing a single cell range a:a dereferences a and returns the post-spill value V'. The 
dependency set required to evaluate a single cell range a:a is the dependency 
set required to dereference a and the address a itself. Evaluating a multiple cell 
range a 1:2 returns an array of the same dimensions, where each value in the ar- 
ray is obtained by dereferencing the corresponding single cell and extracting the 
post-spill value. The dependency set required to evaluate a multiple cell range is 
the dependency set required to dereference every address in the range, and the 
range itself. 


Address dereferencing The relation S,w H a!V*,V',D means that in sheet S 
with oracle w, address a dereferences to pre-spill value V# and post-spill value 
V', and depends upon the addresses in D. Five rules govern address dereferenc- 
ing, based on spill oracle w and owners set owners(w, a). 

The set owners(w, a) is key to the operational semantics and denotes the set of 
owners for address a. If a tuple (ar, i, 7) is in the set owners(w, a), we say a, owns 
a, meaning that a, is a spill root that we expect to spill into address a, and that a 
is offset from a, by i—1 rows and j—1 columns. Hence, to dereference a we must 
first compute the root a, and extract the (i, 7)" spilled value from the root array. 
Our definition allows an address to own itself, denoted (a, 1,1) € owners(w, a), 
and does not preclude an address having multiple owners, violating the single- 
spill policy. We enforce the single-spill policy in our technical results using an 
additional well-formedness condition on oracles, defined in Section 4.5. 

Rule (1) applies when the address has no owner, the address is not a spill 
root, and the address has a formula binding in S. The pre-spill and post-spill 
values are the value obtained by evaluating the bound formula. 

Rule (2) applies when the address has no owner, the address is not a spill 
root, and the address has no formula binding in S. The pre-spill and post-spill 
values are the blank value € and the dependency set is empty. Rules (1) and (2) 
correspond to the address dereferencing behaviour described in the core calculus 
(Section 3) which is lifted to the new relation. 

Rule (3) rule applies when the address is a spill root and the root is not 
permitted to spill. The pre-spill value is the value obtained by evaluating the 
bound formula; the post-spill value is an error value. If the address has no bound 
formula then the relation is undefined. 

Rules (4) and (5) apply when an address with an owner is dereferenced. The 
owner a, is omitted from the spill oracle before evaluating the associated formula, 
denoted by S,w\ar F F 4 V,D. This prevents cycles when the oracle incorrectly 
expects the root to spill, but the root does not, and instead depends on the 
expected spill area. For example, B1 = SUM(B2:B3) and w = [B1 > (3,1, Wv)]. 
The address B1 owns B2 according to w, therefore dereferencing address B2 
requires dereferencing B1, which in-turn depends on B2. If we did not remove 
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B1 from w when evaluating the formula bound to B1 we would create a cycle. We 
remove B1 from w so that when formula SUM(B2:B3) dereferences B2 a blank 
value is returned. Genuine spill cycles are detected post-dereferencing using the 
dependency set. 

Rule (4) applies when the address has an owner and the formula bound to 
the owner evaluates to an array of the expected size according to w. This rule is 
only defined when the intersection of the spill root’s dependencies and its spill 
area is empty, preventing spill cycles. The pre-spill value is obtained using the 
conditional operator a = a, ? V : e. When the dereferenced cell is the root then 
the value is the root array, otherwise the value is blank. The post-spill value is 
obtained by indexing into the root array at the (i, j)" position. 

Rule (5) applies when the address has an owner and the formula bound to the 
owner does not evaluate to an array of the expected size according to w. In this 
case there is no attempt to spill as the oracle is incorrect. When the dereferenced 
address is the root then the pre-spill and post-spill values are obtained from the 
formula, otherwise the pre-spill and post-spill values are blank. 


Sheet evaluation: S,w |) y Sheet evaluation in the spill calculus accepts a spill 
oracle, but is otherwise unchanged from sheet evaluation in the core calculus. The 
computed grid only contains the value of addresses with a bound formula, and 
does not include the value of any blank cells that are in a spill area. In contrast, 
a spreadsheet application would display the value for all addresses, including 
those within a spill area. Obtaining this view can be done by dereferencing 
every address in the viewport using the sheet and oracle. 


4.4 Oracle Refinement 


We have shown how to compute a grid given a sheet and oracle, but we have not 
considered the accuracy of the predictions provided by the oracle. In Section 4.2 
we informally describe an iterative process to refine an oracle from a computed 
grid; in this section we give the precise semantics of oracle refinement. Figure 6 
presents the full definition of oracle refinement. 


Consistency The relation y = w states that grid y is consistent with oracle w. A 
grid is consistent if every address is consistent, written y Ha w. An address a is 
consistent in y and w if, and only if, the grid and oracle agree on the size of the 
value at address a. Consistency tells us that the oracle has correctly predicted 
the location and size of every spill root in the grid, and has not predicted any 
spurious roots. 


Refinement The function refine(S,w,7) takes an inconsistent oracle and returns 
a new oracle that is refined using the computed grid. The function is defined as 
follows. First, start with subset wox of w that is consistent with y. Second, collect 
the remaining unresolved spill roots in y, denoted yr. Finally, recursively select 
the smallest address in yp according to a total order on addresses, determining 
whether the root is permitted to spill and adding the permit to the accumulating 
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Y Ha w L Ym, n, p. (w(a) = (m,n, p)) S 
3IV#, V', D. (y(a) = (V*,V',D) A size(V#) = (m, n)) 
y=w 2 Ya. y Ha w 


refine(S, w, y) = decide(S, wok, Yr) where 


wok = {a > (m,n, p) € w | y =a w} 
yr = {a > (VË, V', D) € y| Im,n. size(V*) = (m,n) and a ¢ dom(wox)} 


decide(S, w, []) = w 
decide(S, w, yla 4 (V*,V', D)]) = decide(S, wa œ> (m, n, p)], 7) 
where a is the least element in dom(y) and size(V*) = (m,n) 
_ Jv if Var € area(a,m,n). a A at > at ¢ dom(S) and owners(w, at) = Ø 
P= x otherwise 


Spill iteration: w — s w’ Final oracle: S F w final 
Sw} y yw refine(S, w, y) = w Sw} y yew 
w —s u SF w final 


Final sheet evaluation: S |) y 


S} y | 3 w and SF w final and S,w 4} 7 


Fig. 6. Oracle Refinement 


oracle. A root is permitted to spill if the potential spill area is blank (excluding 
the root itself) and each address in the spill area has no owner, thereby preserving 
the single-spill policy. 


Spill iteration The relation w —>»s w’ denotes a single iteration of oracle refine- 
ment. When a computed grid is not consistent with the spill oracle that induced 
it, written y A w, a new oracle is produced using function refine(S,w,7). We 
write —+% for the reflexive and transitive closure of —> s. 


Final oracle The relation S F w final states that oracle w is final for sheet S, 
and is valid when the grid induced by w is consistent with w. 


Final sheet evaluation The relation S |} y denotes the evaluation of sheet S to 
grid y which implicitly refines an oracle to a final state. The process starts with 
an empty oracle [] and iterates until a final oracle is found. 
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4.5 Technical Results 


This section presents the main technical result of the spill calculus: that iteration 
of oracle refinement converges for well-behaved sheets. We begin with prelimi- 
nary definitions and results. 

To avoid ambiguous evaluation every spill area must be disjoint and unob- 
structed; an oracle is well-formed if it predicts non-blank spill roots, and predicts 
disjoint and unobstructed spill areas, defined below: 


Definition 1 (Well-formed oracle). We write S F w wf if oracle w is well- 
formed for sheet S. An oracle w is well-formed if for all addresses a the following 
conditions are satisfied: 


1. Ifa g dom(S) then a ¢ dom(w). 
2. ljowners(w,a)| < 1. 
3. If (ar, i,j) E€ owners(w,a) anda #a, then a ¢ dom(S). 


The definition of oracle refinement in Figure 6 preserves well-formedness. 
Lemma 1. If St w wf and S,w |) y then SF refine(S,w, y) wf. 


Producing well-formed oracles alone is insufficient to guarantee convergence. 
Oracle refinement would never reach a consistent state if the predicted spill areas 
were incorrectly sized. 

The definition of oracle refinement in Figure 6 predicts spill areas that are 
correctly sized with respect to the current grid. 


Lemma 2. If St wwf and S,w }} y then y = refine(S,w, y). 


Predicting correctly sized spill areas is also insufficient to guarantee con- 
vergence. Oracle refinement would never reach a consistent state if it oscillates 
between permitting and rejecting the same root to spill. Consider the sheet: 


Let S © [A1 +> {1;2}, B14 IF(A2 = 2, {3; 4}, 0) 


Spill iteration would continue indefinitely if refinement cycled between the 
following two well-formed and correctly sized oracles: 


[Al > (2,1, ”)] — [A1 > (2,1, x), Bl (2,1,W)] > --- 


To avoid oscillating spill iteration the process of oracle refinement should be 
permit preserving, defined below: 


Definition 2 (Permit preserving extension). We write yw Sw! if 
oracle w is a permit preserving extension of w in context y. Defined as: 


yeu Sw’ 2 va,m,n,p. (y Ka w Awla) = (m,n,p)) > w' (a) = (m,n, p) 


The definition of oracle refinement in Figure 6 is permit preserving. 
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Lemma 3. If St wwf and S,w |) y then y+ w S refine(S,w,7). 


Spill iteration should be a converging iteration but this cannot be guaranteed 
in general; at any given step in the iteration a sheet can fail to evaluate to a grid. 
This can happen because the sheet contains a cell cycle, spill cycle, or diverging 
grid calculus term. Instead, we only expect that if the sheet is free from these 
divergent scenarios then spill iteration must converge. To allow us to dissect 
different forms of divergence and focus on spill iteration we only consider acyclic 
sheets, defined below: 


Definition 3 (Acyclic). A sheet S is acyclic if for all w such that St w wf, 
there exists some y such that S,w 1) y. 


For instance, none of the following sheets are acyclic: [Al +> A1] has a 
cell cycle, [Al +» B1:C1] has a spill cycle, and [A1 > QJ] has a formula Q 
that diverges. Divergent terms are not encodable in the spill calculus but are 
encodable in the grid calculus, as we show in Section 6.1. An alternative approach 
would be to explicitly model divergence in our semantics of sheet evaluation and 
show that iteration converges or the sheet diverges. We choose not to pursue 
this approach to improve the clarity of our operational semantics, but note that 
our semantics can be extended to model cycles. 

For any acylic sheet, spill iteration will converge to a final spill oracle. 


Theorem 1 (Convergence). For all acyclic S and w such that S + w wf, 
there exists an oracle w such that w —% w' and S F w' final. 


Proof. (Sketch—see Appendix B of the extended version [21] for the full proof.) 
The value of any address with a binding is a function of its dependencies and the 
oracle prediction for that address. We inductively define an address as fired if 
the oracle prediction is consistent for the address, and every address in the spill- 
dependency set (defined in {21]) is fixed. Lemma 3 states that correct predictions 
are always preserved, therefore a fixed address remains fixed through iteration 
and its value remains invariant. The dependency graph of the sheet is acyclic 
therefore if there is a non-fixed address then there must be a non-fixed address 
with no dependencies but an inconsistent oracle prediction—we call this a non- 
fixed source. Lemma 2 states that every new oracle correctly predicts the size 
with respect to the previous grid, therefore any non-fixed sources will be fixed 
in the new oracle. We conclude by observing that the number of fixed addresses 
in the sheet strictly increases at each step, and when every address is fixed the 
oracle is final. 


4.6 Limitations and Differences with Real Systems 


Permit preservation requires that if the size of an array does not change then 
the permit (which may be x) is preserved—this property is crucial for our proof 
of convergence. 

Real spreadsheet systems such as Sheets and Excel do not guarantee permit 
preservation. A root a that is prevented from spilling using a permit x can later 
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be permitted to spill, even if the size of the associated array does not change. 
This particular interaction arises when a root that was previously preventing a 
from spilling changes dimension, freeing a previously occupied spill area. Per- 
mitting roots to spill into newly freed regions of the grid is desirable from a user 
perspective because it reflects the visual aspect of spreadsheet programming 
where an array will spill into any unoccupied cells. 

A limitation of our formalism, if implemented directly, is that there exist some 
spreadsheets that when evaluated will prevent an array from spilling, despite the 
potential spill area being blank. Consider the sheet: 


[A3 ++ {1, 2,3}, C1 + IF(ISERROR(A3), 0, {4; 5; 6})] 


When the total ordering used by oracle refinement orders A3 before C1 then 
the behaviour is as expected: A3 spills to the right and C1 evaluates to an error 
value. When the total ordering used by oracle refinement orders Cl before A3 
then the behaviour appears peculiar: A3 evaluates to an error value and Cl 
evaluates to 0. The root A3 is prevented from spilling despite there appearing 
room in the grid! The issue is that the array in A3 never changes size, therefore 
the permit x assigned to the root is preserved, despite root C1 relinquishing the 
spill area on subsequent spill iterations. 

The fundamental problem is one of constraint satisfaction. We would like to 
find a well-formed oracle that maximizes the number of roots that can spill in 
a deterministic manner. The total order on addresses ensures determinism but 
restricts the solution space. Our approach could be modified to deterministically 
permute the ordering until an optimal solution is found, however such a method 
would be prohibitively expensive. 

Both Sheets and Excel find the best solution to our example sheet. We expect 
their implementations do not permute a total order on addresses, but implement 
a more efficient algorithm that runs for a bounded time. Finding a more efficient 
algorithm that is guaranteed to terminate remains an open challenge. 

The limitation we present in our formalism only arises when a spreadsheet 
includes dynamic spill collisions and conditional spilling. We anticipate that this 
is a rare use case for spilled arrays, and does not arise when using spilled arrays 
to implement gridlets for live copy-paste-modify. 


5 Grid Calculus: Spill Calculus with Sheets as Values 


In this section we present the grid calculus: a higher-order spreadsheet calculus 
with sheets as values. The grid calculus extends the spill calculus of Section 4. 


5.1 Extending Spreadsheets with Gridlets 


The gridlet concept [12] has been proposed but not implemented. Our observa- 
tion is that spilling a range reference acts much like copy-paste, but lacks local 
modification. We propose to implement gridlets using spilled arrays, by extend- 
ing the spill calculus with primitives that implement first-class grid modification. 
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A B C A B C 
1 “Edge” “Ten.” : 
> o 3 B272 ' 
a “p” 4 B372 6 G(A1:C4, B2,7, B3, 24) 
4| “e ISQRT(C4) C2 + C3 - 
i ; 


Source range A1:C4 Gridlet invocation in A6 


Revisiting the example from the introduction, there are four key interactions 
happening in the invocation of a gridlet. 


First, select the content in the grid that is to be modified. 
Second, apply the selected modifications or updates. 

Third, calculate the grid using the modified content. 

Fourth and finally, project the calculated content into the grid. 


Spreadsheets with spilled arrays support the final step but lack the capabilities 
to support the first three. We add these capabilities using four new constructs. 


First-class sheet values (S). 

Operator GRID that evaluates to the current sheet. 

Operator UPDATE that binds a formula in a sheet-value. 

Operator VIEW that evaluates a given range in a sheet-value to an array. 


Using these constructs we can implement gridlets, for example: 


G(A1:C4, B2, 7, B3, 24) & 


VIEW(UPDATE(UPDATE(GRID, B2, 7), B3, 24), A1:C4) 


Formatting is a core feature of Gridlets, but we omit formatting from the grid 
calculus for clarity, on the basis that it would be a straightforward addition. We 
now describe the details of the grid calculus. 


5.2 Syntax and Operational Semantics 


Figure 7 presents the syntax and operational semantics for the grid calculus. The 
grid calculus does not require modification of existing rules; we only add formula 
evaluation rules for the new constructs, and evaluation relations for views. 


Syntax Let x range over formula identifiers. Let F range over formulas which 
may additionally be identifiers x, LET (x, F1, F2) which binds the result of evalu- 
ating F, to x in Fy, GRID which captures the current sheet, UPDATE(F\, a, F2) 
which updates a formula binding in a sheet-value, and VIEW(F, r) which extracts 
a dereferenced range from a sheet-value. Let V range over values which may ad- 
ditionally be a sheet-value (S). Let V range over views; a view is a sheet with a 
range, denoted (S,7r). A view range r delimits the addresses to be computed in 
sheet S. 
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Identifier x € IDENT 


Formula F ::=---|a|LET(z, Fi, F2) | GRID | UPDATE(F,, a, F2) | VIEW(F, r) 
Value V =- | (S) 
View V := (S,r) 


Formula evaluation: SwF F 4 V,D 


S,wk Fi 4 Vi, Dı S,w F Fala := Vi] 4 V2, D2 


S,w H LET (x, Fi, F2) | V2, Dı U D2 S,w GRID |} (S), 2 
Swt Fy | (S1), D S,wKFY(S),D (Sor) UV 
S,w F UPDATE(Fi, a, F2) (Sila FJ), Di S,w F VIEW(F, r) | V, D 


View evaluation: V, w J) y 


(S,r), why S Ya e dom(S)N area(r). S,wF a!q(a) 


Spill iteration: w —y w’ Final oracle: V F w final 
(S,r) why yw  refine(S,w,y) =w" Viwty Ew 
W —>(s,r) w VY F w final 


Final view evaluation: V |) V 


(S,r) 4V af [| fs.) w and (S,r) F w final and Sw r4 V,D 


Fig. 7. Syntax and Operational Semantics for Grid Calculus (Extends Figures 3—6) 


Formula evaluation: S,w F F } V,D A formula LET (2, Fi, F2) evaluates in the 
standard way. A formula GRID evaluates to a sheet-value that captures the cur- 
rent sheet. A formula UPDATE(F}, a, F2) updates a formula binding in a sheet- 
value. If evaluating formula F} produces sheet-value (S;) then UPDATE(F}, a, F2) 
evaluates to the sheet-value where a is bound to F> in S1, denoted (Si [a> Fy). 
A formula VIEW(F,r) evaluates a sheet-value and extracts a range. If evaluat- 
ing formula F produces sheet-value (S1) then VIEW(F, r) evaluates to the value 
obtained by evaluating view (S1,r). View evaluation is defined in Figure 7 and 
we describe the semantics at the end of the section. Here we address a subtle 
property of VIEW; evaluating a view (S,r) adds no dependencies to the con- 
taining formula. Dependency tracking in our semantics is used to prevent spill 
cycles and captures dependence between values of addresses: the value of a spill 
root should not depend on the value of an address in the spill area. In contrast, 
sheet-values depend on the formula of an address in the containing sheet, but 
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not the value of an address in the containing sheet. For example: 


Let S @ [A1 => VIEW(UPDATE(GRID, A1, 10), A2), A2 => Al] 


Sheet S evaluates to grid [Al + 10, A2 ++ 10]. What are the dependencies of 
each address? The value of A2 in the grid depends on the value of A1 in the grid. 
In contrast, the value of A1 in the grid does not depend on the value of A2 in the 
grid. This is because evaluating the formula in Al constructs a private grid from 
which the value of A2 is obtained. However, Al does depend on the formula 
of A2 in the containing grid. Our semantics only considers value dependence, 
therefore the dependency set of Al is @—the address has no dependence on 
values in the containing grid. 

Formula dependence is vital for efficient recalculation, though we do not 
model that in our semantics and only use dependency tracking to prevent spill 
cycles. If an address depends on the value of another address bound in a sheet, 
then it also depends on the formula of that address. The converse is not true in 
the presence of sheet-values. 


View evaluation: V,w |) y Evaluation of view (S,7r) with oracle w is defined in 
a similar manner as evaluation of sheets, however the induced grid y is limited 
to the sheet bindings that intersect the range r. There are two key consequences 
that arise from limiting the induced grid. First, we only evaluate the bindings 
in S required to evaluate the bindings in r. Second, only roots that are within 
range r are permitted to spill; any root that is outside r remains as an address 
containing a collapsed array. There is a difference between an address that holds 
a collapsed array and a root that is prevented from spilling an array by permit 
x. The former has a pre-spill and post-spill value that is an array; the latter has 
a pre-spill value that is an array and a post-spill value that is an error. 


Spill iteration: w —+y w’ The definition of spill iteration for views is the same 
as spill iteration for sheets, except that we use view evaluation rather than sheet 
evaluation. 


Final oracle: V F w final The definition of a final oracle for views is the same as 
a final oracle for sheets, except that we use view evaluation rather than sheet 
evaluation. 


Final view evaluation: V |} V Evaluating a view (S,r) computes a final oracle 
for the view and then evaluates range r in the context of sheet S. Final view 
evaluation will evaluate range r, rather than extracting values from an induced 
grid, because viewing a range should sample all values in the range—including 
blank cells. If we extract values from the induced grid we can only obtain the 
values for addresses with a binding in r. 
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5.3 Formulas for Gridlets 


We can encode the G operator using primitives from the grid calculus. 


[G(r, ai, Vi, ears an, Va )l = VIEW ([(a1, Vi, Soe nig an, Val r) 
[(a1, Vi)] = UPDATE(GRID, a1, Vi) 
[(a1, Vi, TET An+1; Vi+1)] = UPDATE([(a1, Vi, ... 3 n3 Vn) An+1; Va+1) 


The G operator translates to the VIEW operator, and any bindings translate to 
a sequence of UPDATE operations. The initial sheet-value is obtained from the 
context using the GRID operator. 

The translation illustrates that G is not higher-order because every applica- 
tion returns the value obtained by evaluating a view on a sheet-value. A language 
that only provides G does not permit sheet-values to escape and be manipulated 
by formulas. This is acceptable when emulating copy-paste because a copy is 
always taken with respect to the top-level sheet, however this does limit the 
usefulness of G as an implementation construct. This limitation motivates the 
design of the grid calculus; as we show in the next section, the grid calculus is 
capable of encoding other language features. 


6 Encoding Objects, Lambdas, and Functions 


In this section we give three encodings that target the grid calculus: objects, 
lambdas, and sheet-defined functions. 


6.1 Encoding the Abadi and Cardelli Object Calculus 


We introduce the grid calculus to implement gridlets and the concept of live 
copy-paste. Perhaps surprisingly, the grid calculus can encode object-oriented 
programming, in particular the untyped object calculus of Abadi and Cardelli [1]. 
Their calculus is a tiny object-based programming language, akin to a prototype- 
based language such as Self [6], but capable of representing class-based object- 
oriented programming via encodings. 

We draw a precise analogy between spreadsheets and objects. A sheet is like 
an object. A cell is like a method name. A formula in a cell is like a method 
implementation. The GRID operator is like the this keyword. Formula update is 
like method update. 

We assume an isomorphism between method names ¢ and cell addresses a 
and use @ in both the object calculus and grid calculus. We define the translation 
of object calculus terms to grid calculus formulas, denoted [b], as follows: 


[x] =x 
[lee = s(2:)b:®0"]] = CG = [seb I?) 
[b.4] = VIEW([2], 2) 
[b1 -€ = s(x)bo] = UPDATE( bi], £, [s(x)be]) 
[s(x)b] = LET (x, GRID, [}]) 
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The translation makes our analogy concrete. We use the LET formula to lexically 
capture self identifiers. The grid calculus allows the construction of diverging 
formulas, as discussed in Section 4.5. We demonstrate this using a diverging 
object calculus term. 


Q = [[A1 = <(x)z.A1].Al] = VIEW(([A1 + LET (2, GRID, VIEW(z, A1))]), A1) 


The operational semantics are preserved by the translation. We assume a big- 
step relation for object calculus terms, denoted b 4} o. The proof is in Appendix 
C of the extended version [21]. 


Theorem 2. If b is a closed and b 4} o then |], [] F [b] 4 Jo], Ø. 


6.2 Encoding the Lambda Calculus 


We give an encoding of the lambda calculus that is inspired by the object calculus 
embedding of the lambda calculus. We use ARG1 to hold the argument and 
VALI to hold the result of a lambda. In spreadsheet languages both ARG1 and 
VALI are legal cell addresses; for example, address ARG1 denotes the cell at 
column 1151 and row 1. 


[z] =z 
[Az.M] = UPDATE(GRID, VAL1, LET (x, VIEW(GRID, ARG1), [M])) 
[M N] = VIEW(UPDATE([M], ARG1, [N]), VAL1) 


6.3 Encoding Sheet-Defined Functions 


A sheet-defined function [14, 17, 19,20] is a mechanism for a user to author a 
function using a region of a spreadsheet. We can model a sheet-defined function 
f as a triple (S,(ao,...,@n),7) that consists of the moat or sheet-bindings for 
the function, the addresses from the moat that denote arguments, and the range 
from the moat that denotes the result. The application f(Vọ,...,Vn) can be 
encoded in the grid calculus as follows, where f = (S,(ao,...,@n),1): 


[f(Vo,---+Vn)] = VIEW([(Vo,---;Vn)I,7) 


[OI = (S) 
[(Vo, tees Vw+1)] = UPDATE([(Vọo, Romie Vn’); An’+1; Vnr41) 


7 Related Work 


Formal Semantics of Spreadsheets. Our core calculus is similar to previous for- 
malisms for spreadsheets, Several previous works [3, 7, 14,19] offer formal se- 
mantics for spreadsheet fragments. Mokhov et al. [16] capture the logic of re- 
calculating dependent cells. Finally, Bock et al. [4] provide a cost semantics for 
evaluation of spreadsheet formulas. 
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Spilling. Major spreadsheet implementations like Sheets ê and Excel 7 implement 
spilled arrays |11], but do not document details of the implementation. In [17], 
authors propose a spilling-like mechanism that allows matrix values in cells to 
spread across a predefined range—this is closely related to “Ctrl+Shift+Enter” 
formulas ® in Excel. The proposal in [17] is significantly simpler than spilled 
arrays because the dimension of the spilled area is fixed and declared ahead of 
time. Sarkar et al. [18] note that spilled arrays violate Kay’s value principle [13] 
because a user is unable to edit constituent cells, except for the spill root. 


Extending the Spreadsheet Paradigm. Clack and Braine [8] propose a spreadsheet 
based on a combination of functional and object-oriented programming. Their 
integration is different from our analogy: in their system, a class is a collection 
of parameterised worksheets, and a parameterised worksheet corresponds to a 
method. In gridlets, the grid corresponds to an object and cells on the grid 
correspond to methods of the object. 


Similarity Inheritance in Forms/3. Forms/3 [5] is a visual programming lan- 
guage that borrows the key concept of cell from spreadsheets. Instead of a tab- 
ular sheet, cells in Forms/3 are arranged on a form: a canvas with no structure. 
Forms/3 explored an abstraction model called “similarity inheritance” through 
which a form may borrow cells from another form and optionally modify at- 
tributes of certain cells. This resembles substitution in gridlets, however reusing 
a portion of the tabular grid and spilling into adjacent cells are primary to 
gridlets, whereas such notions are absent from Forms/3. 


Sheet-defined Functions. Sheet-defined functions [17] (SDFs) allow the user to 
reuse logic defined using formulas in the grid. The user nominates input cells, an 
output cell, and gives the function a name. When the function is called, a virtual 
copy of the workbook is instantiated. Arguments to the function are placed in 
the input cells, the virtual workbook is calculated, and the result from the output 
cell is returned. 

Elastic SDFs [14] generalize SDFs to handle input arrays of arbitrary size. 
In [4], the authors provide a precise semantics for SDFs, closures and array 
formulas, but not for spilling. Gridlets are more general than SDFs as each 
Gridlet invocation can have a unique set of local substitutions, whereas all calls 
to an SDF share the same arguments, giving greater flexibility to the user. 


Error prevention and Error detection. Abraham and Erwig propose type systems 
for error detection [3] and automatic model inference [2]. Abraham and Erwig [3] 
provide an operational semantics for sheets that is similar to the core calculus 
in Section 3, but they do not give a semantics for spilled arrays. 

Gencel [10] is a typed “template language” that describes the layout of a de- 
sired worksheet along with a set of customized update operations that are specific 


6 https: //support.google.com/docs/answer /6208276?hl=en 
T https: //aka.ms/excel-dynamic-arrays 
8 https: //aka.ms/excel-cse- formulas 
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to the particular template. The type system guarantees that the restricted set 
of update operations keeps the desired worksheet free from omission, reference, 
and type errors. 

Cheng and Rival [7] use abstract interpretation to detect formula errors due 
to mismatch in type. Their technique also incorporates analysis of associated 
programs, such as VBA scripts, along with formulas on the grid. 


8 Conclusion 


Repetition is common in programming—spreadsheets are no different. The dis- 
tinguishing property of spreadsheets is that reuse includes formatting and layout, 
and is not limited to formula logic. Gridlets [12] are a high-level re-use abstrac- 
tion for spreadsheets. In this work we give the first semantics of gridlets as a 
formula. Our approach comes in two stages. 

First, we make sense of spilled arrays, a feature that is available in major 
spreadsheet implementations but not previously formalised. The concept is sim- 
ple and belies the many subtleties involved in implementing spilled arrays. We 
present the spill calculus as a concise description of spilling in spreadsheets. 

Second, we extend the spill calculus with the tools to implement gridlets. The 
grid calculus introduces the concept of first-class sheet values, and describes the 
semantics of three higher-order operators that emulate copy-paste-modify. The 
composition of these operators gives the semantics for gridlet operator G. 

Spreadsheet programming bears a resemblance to object-oriented program- 
ming, alluded to often in the literature. We show that the resemblance runs deep 
by giving an encoding of the object calculus into the grid calculus, with a direct 
parallel between objects and sheets. 
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